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Preface for Students 


You are about to immerse yourself in serious mathematics, with an emphasis on 
attaining a deep understanding of the definitions, theorems, and proofs related to 
measure, integration, and real analysis. This book aims to guide you to the wonders 
of this subject. 

You cannot read mathematics the way you read a novel. If you zip through a page 
in less than an hour, you are probably going too fast. When you encounter the phrase 
as you should verify, you should indeed do the verification, which will usually require 
some writing on your part. When steps are left out, you need to supply the missing 
pieces. You should ponder and internalize each definition. For each theorem, you 
should seek examples to show why each hypothesis is necessary. 

Working on the exercises should be your main mode of learning after you have 
read a section. Discussions and joint work with other students may be especially 
effective. Active learning promotes long-term understanding much better than passive 
learning. Thus you will benefit considerably from struggling with an exercise and 
eventually coming up with a solution, perhaps working with other students. Finding 
and reading a solution on the internet will likely lead to little learning. 

As a visual aid, throughout this book definitions are in yellow boxes and theorems 
are in blue boxes, in both print and electronic versions. Each theorem has an informal 
descriptive name. The electronic version of this manuscript has links in blue. 

Please check the website below (or the Springer website) for additional information 
about the book. These websites link to the electronic version of this book, which is 
free to the world because this book has been published under Springer’s Open Access 
program. Your suggestions for improvements and corrections for a future edition are 
most welcome (send to the email address below). 

The prerequisite for using this book includes a good understanding of elementary 
undergraduate real analysis. You can download from the website below or from the 
Springer website the document titled Supplement for Measure, Integration & Real 
Analysis. That supplement can serve as a review of the elementary undergraduate real 
analysis used in this book. 

Best wishes for success and enjoyment in learning measure, integration, and real 
analysis! 


Sheldon Axler 

Mathematics Department 

San Francisco State University 
San Francisco, CA 94132, USA 


website: http://measure.axler.net 
e-mail: measure @axler.net 
Twitter: @AxlerLinear 
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Preface for Instructors 


You are about to teach a course, or possibly a two-semester sequence of courses, on 
measure, integration, and real analysis. In this textbook, I have tried to use a gentle 
approach to serious mathematics, with an emphasis on students attaining a deep 
understanding. Thus new material often appears in a comfortable context instead 
of the most general setting. For example, the Fourier transform in Chapter 11 is 
introduced in the setting of R rather than R” so that students can focus on the main 
ideas without the clutter of the extra bookkeeping needed for working in R”. 

The basic prerequisite for your students to use this textbook is a good understand- 
ing of elementary undergraduate real analysis. Your students can download from the 
book’s website (http://measure.axler.net) or from the Springer website the document 
titled Supplement for Measure, Integration & Real Analysis. That supplement can 
serve as a review of the elementary undergraduate real analysis used in this book. 

As a visual aid, throughout this book definitions are in yellow boxes and theorems 
are in blue boxes, in both print and electronic versions. Each theorem has an informal 
descriptive name. The electronic version of this manuscript has links in blue. 

Mathematics can be learned only by doing. Fortunately, real analysis has many 
good homework exercises. When teaching this course, during each class I usually 
assign as homework several of the exercises, due the next class. I grade only one 
exercise per homework set, but the students do not know ahead of time which one. I 
encourage my students to work together on the homework or to come to me for help. 
However, I tell them that getting solutions from the internet is not allowed and would 
be counterproductive for their learning goals. 

If you go at a leisurely pace, then covering Chapters 1-5 in the first semester may 
be a good goal. If you go a bit faster, then covering Chapters 1-6 in the first semester 
may be more appropriate. For a second-semester course, covering some subset of 
Chapters 6 through 12 should produce a good course. Most instructors will not have 
time to cover all those chapters in a second semester; thus some choices need to 
be made. The following chapter-by-chapter summary of the highlights of the book 
should help you decide what to cover and in what order: 


e Chapter 1: This short chapter begins with a brief review of Riemann integration. 
Then a discussion of the deficiencies of the Riemann integral helps motivate the 
need for a better theory of integration. 


e Chapter 2: This chapter begins by defining outer measure on R as a natural 
extension of the length function on intervals. After verifying some nice properties 
of outer measure, we see that it is not additive. This observation leads to restricting 
our attention to the v-algebra of Borel sets, defined as the smallest 7-algebra on R 
containing all the open sets. This path leads us to measures. 
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After dealing with the properties of general measures, we come back to the setting 
of R, showing that outer measure restricted to the o-algebra of Borel sets is 
countably additive and thus is a measure. Then a subset of R is defined to be 
Lebesgue measurable if it differs from a Borel set by a set of outer measure 0. This 
definition makes Lebesgue measurable sets seem more natural to students than the 
other competing equivalent definitions. The Cantor set and the Cantor function 
then stretch students’ intuition. 


Egorov’s Theorem, which states that pointwise convergence of a sequence of 
measurable functions is close to uniform convergence, has multiple applications in 
later chapters. Luzin’s Theorem, back in the context of R, sounds spectacular but 
has no other uses in this book and thus can be skipped if you are pressed for time. 


Chapter 3: Integration with respect to a measure is defined in this chapter in a 
natural fashion first for nonnegative measurable functions, and then for real-valued 
measurable functions. The Monotone Convergence Theorem and the Dominated 
Convergence Theorem are the big results in this chapter that allow us to interchange 
integrals and limits under appropriate conditions. 


Chapter 4: The highlight of this chapter is the Lebesgue Differentiation Theorem, 
which allows us to differentiate an integral. The main tool used to prove this 
result cleanly is the Hardy—Littlewood maximal inequality, which is interesting 
and important in its own right. This chapter also includes the Lebesgue Density 
Theorem, showing that a Lebesgue measurable subset of R has density 1 at almost 
every number in the set and density 0 at almost every number not in the set. 


Chapter 5: This chapter deals with product measures. The most important results 
here are Tonelli’s Theorem and Fubini’s Theorem, which allow us to evaluate 
integrals with respect to product measures as iterated integrals and allow us to 
change the order of integration under appropriate conditions. As an application of 
product measures, we get Lebesgue measure on R” from Lebesgue measure on R. 
To give students practice with using these concepts, this chapter finds a formula for 
the volume of the unit ball in R”. The chapter closes by using Fubini’s Theorem to 
give a simple proof that a mixed partial derivative with sufficient continuity does 
not depend upon the order of differentiation. 


Chapter 6: After a quick review of metric spaces and vector spaces, this chapter 
defines normed vector spaces. The big result here is the Hahn—Banach Theorem 
about extending bounded linear functionals from a subspace to the whole space. 
Then this chapter introduces Banach spaces. We see that completeness plays 
a major role in the key theorems: Open Mapping Theorem, Inverse Mapping 
Theorem, Closed Graph Theorem, and Principle of Uniform Boundedness. 


Chapter 7: This chapter introduces the important class of Banach spaces LP (}:), 
where 1 < p < w and p/ is a measure, giving students additional opportunities to 
use results from earlier chapters about measure and integration theory. The crucial 
results called Hélder’s inequality and Minkowski’s inequality are key tools here. 
This chapter also shows that the dual of £? is ¢?’ for 1 < p<. 


Chapters | through 7 should be covered in order, before any of the later chapters. 
After Chapter 7, you can cover Chapter 8 or Chapter 12. 
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Chapter 8: This chapter focuses on Hilbert spaces, which play a central role in 
modern mathematics. After proving the Cauchy—Schwarz inequality and the Riesz 
Representation Theorem that describes the bounded linear functionals on a Hilbert 
space, this chapter deals with orthonormal bases. Key results here include Bessel’s 
inequality, Parseval’s identity, and the Gram—Schmidt process. 


Chapter 9: Only positive measures have been discussed in the book up until this 
chapter. In this chapter, real and complex measures get consideration. These con- 
cepts lead to the Banach space of measures, with total variation as the norm. Key 
results that help describe real and complex measures are the Hahn Decomposition 
Theorem, the Jordan Decomposition Theorem, and the Lebesgue Decomposition 
Theorem. The Radon—Nikodym Theorem is proved using von Neumann’s slick 
Hilbert space trick. Then the Radon—Nikodym Theorem is used to prove that the 
dual of LP (j:) can be identified with L?’(w) for 1 < p < c and #1 a (positive) 
measure, completing a project that started in Chapter 7. 


The material in Chapter 9 is not used later in the book. Thus this chapter can be 
skipped or covered after one of the later chapters. 


Chapter 10: This chapter begins by discussing the adjoint of a bounded linear 
map between Hilbert spaces. Then the rest of the chapter presents key results 
about bounded linear operators from a Hilbert space to itself. The proof that each 
bounded operator on a complex nonzero Hilbert space has a nonempty spectrum 
requires a tiny bit of knowledge about analytic functions. Properties of special 
classes of operators (self-adjoint operators, normal operators, isometries, and 
unitary operators) are described. 


Then this chapter delves deeper into compact operators, proving the Fredholm 
Alternative. The chapter concludes with two major results: the Spectral Theorem 
for compact operators and the popular Singular Value Decomposition for compact 
operators. Throughout this chapter, the Volterra operator is used as an example to 
illustrate the main results. 


Some instructors may prefer to cover Chapter 10 immediately after Chapter 8, 
because both chapters live in the context of Hilbert space. I chose the current order 
to give students a breather between the two Hilbert space chapters, thinking that 
being away from Hilbert space for a little while and then coming back to it might 
strengthen students’ understanding and provide some variety. However, covering 
the two Hilbert space chapters consecutively would also work fine. 


Chapter 11: Fourier analysis is a huge subject with a two-hundred year history. 
This chapter gives a gentle but modern introduction to Fourier series and the 
Fourier transform. 


This chapter first develops results in the context of Fourier series, but then comes 
back later and develops parallel concepts in the context of the Fourier transform. 
For example, the Fourier coefficient version of the Riemann—Lebesgue Lemma is 
proved early in the chapter, with the Fourier transform version proved later in the 
chapter. Other examples include the Poisson kernel, convolution, and the Dirichlet 
problem, all of which are first covered in the context of the unit disk and unit circle; 
then these topics are revisited later in the context of the half-plane and real line. 
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Convergence of Fourier series is proved in the L? norm and also (for sufficiently 
smooth functions) pointwise. The book emphasizes getting students to work with 
the main ideas rather than on proving all possible results (for example, pointwise 
convergence of Fourier series is proved only for twice continuously differentiable 
functions rather than using a weaker hypothesis). 


The proof of the Fourier Inversion Formula is the highlight of the material on the 
Fourier transform. The Fourier Inversion Formula is then used to show that the 
Fourier transform extends to a unitary operator on L? (R). 


This chapter uses some basic results about Hilbert spaces, so it should not be 
covered before Chapter 8. However, if you are willing to skip or hand-wave 
through one result that helps describe the Fourier transform as an operator on 
L?(R) (see 11.87), then you could cover this chapter without doing Chapter 10. 


Chapter 12: A thorough coverage of probability theory would require a whole 
book instead of a single chapter. This chapter takes advantage of the book’s earlier 
development of measure theory to present the basic language and emphasis of 
probability theory. For students not pursuing further studies in probability theory, 
this chapter gives them a good taste of the subject. Students who go on to learn 
more probability theory should benefit from the head start provided by this chapter 
and the background of measure theory. 


Features that distinguish probability theory from measure theory include the 
notions of independent events and independent random variables. In addition to 
those concepts, this chapter discusses standard deviation, conditional probabilities, 
Bayes’ Theorem, and distribution functions. The chapter concludes with a proof of 
the Weak Law of Large Numbers for independent identically distributed random 
variables. 


You could cover this chapter anytime after Chapter 7. 


Please check the website below (or the Springer website) for additional information 


about the book. These websites link to the electronic version of this book, which is 


free to the world because this book has been published under Springer’s Open Access 
program. Your suggestions for improvements and corrections for a future edition are 
most welcome (send to the email address below). 


I enjoy keeping track of where my books are used as textbooks. If you use this 


book as the textbook for a course, please let me know. 


Best wishes for teaching a successful class on measure, integration, and real 


analysis! 

Sheldon Axler Contact the author, or Springer if the 
Mathematics Department author is not available, for permission 
San Francisco State University for translations or other commercial 
San Francisco, CA 94132, USA re-use of the contents of this book. 


website: http://measure.axler.net 
e-mail: measure @axler.net 
Twitter: @AxlerLinear 
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Chapter 1 oe 


Riemann Integration 


This brief chapter reviews Riemann integration. Riemann integration uses rectangles 
to approximate areas under graphs. This chapter begins by carefully presenting 
the definitions leading to the Riemann integral. The big result in the first section 
states that a continuous real-valued function on a closed bounded interval is Riemann 
integrable. The proof depends upon the theorem that continuous functions on closed 
bounded intervals are uniformly continuous. 

The second section of this chapter focuses on several deficiencies of Riemann 
integration. As we will see, Riemann integration does not do everything we would 
like an integral to do. These deficiencies provide motivation in future chapters for the 
development of measures and integration with respect to measures. 


Digital sculpture of Bernhard Riemann (1826-1866), 
whose method of integration is taught in calculus courses. 
©Doris Fiebig 


© Sheldon Axler 2020 
S. Axler, Measure, Integration & Real Analysis, Graduate Texts 1 
in Mathematics 282, https://doi.org/10.1007/978-3-030-33143-6_ 1 


2 Chapter 1 Riemann Integration 


1A Review: Riemann Integral 


We begin with a few definitions needed before we can define the Riemann integral. 
Let R denote the complete ordered field of real numbers. 


1 Definition partition 


Suppose a,b € R witha < b. A partition of [a,b] is a finite list of the form 


X0,X1,.--,Xn, where 


i— Xa ace 


We use a partition x9,x1,...,Xn of [a,b] to think of [a,b] as a union of closed 
subintervals, as follows: 


[a,b] = [xo, x1] U [x1, x2] U-+-U [Xn-1, Xn]. 


The next definition introduces clean notation for the infimum and supremum of 
the values of a function on some subset of its domain. 


1.2 Definition notation for infimum and supremum of a function 


If f is a real-valued function and A is a subset of the domain of f, then 


inf f = inf{ f(x) :x€ A} and sup = sup{f(x):x € A}. 


The lower and upper Riemann sums, which we now define, approximate the 
area under the graph of a nonnegative function (or, more generally, the signed area 
corresponding to a real-valued function). 


1.3 Definition lower and upper Riemann sums 


Suppose f: [a,b] — R is a bounded function and P is a partition x9,...,Xn 
of [a,b]. The lower Riemann sum L(f,P, [a,b]) and the upper Riemann sum 
U(f,P, |a,b]) are defined by 


EGE P| [a, b]) — Xj- 1) inf 
7a [xj-1/%)] 


LG Playa) — 3 (xj — 4-1) sup f. 


j=l [xj1, xj] 


Our intuition suggests that for a partition with only a small gap between consecu- 
tive points, the lower Riemann sum should be a bit less than the area under the graph, 
and the upper Riemann sum should be a bit more than the area under the graph. 
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The pictures in the next example help convey the idea of these approximations. 
The base of the i rectangle has length x; — x;_, and has height inf f for the 
eee aa 
lower Riemann sum and height sup f for the upper Riemann sum. 
[xj-1,%j] 


1.4 Example lower and upper Riemann sums 


Define f: [0,1] > R by f(x) = x2. Let P,, denote the partition 0,4,2,...,1 


pints 
of [0,1]. 
1 T+ 
The two figures here show 
the graph of f in red. The 
infimum of this function f 
is attained at the left end- 
point of each subinterval 
ae L); the supremum is 
attained at the right end- 
; 7. = 20m a ae 
L(x, Pr, [0,1]) is the U(x?, Pye, [0, 1]) is the 
sum of the areas of these sum of the areas of these 
rectangles. rectangles. 


For the partition P;,, we have es a i for each j = 1,...,n. Thus 


L(x?, Pn, (0,1]) = 


1 ye _ 2n?-3n+1 
n 


Pe ne 6n? 
and 17? 2m? +3041 
U(x", Pp, (0, 1]) = — dX 2 see 
as you should verify [use the formula 1 + 4+9+---+n? = seer 


The next result states that adjoining more points to a partition increases the lower 
Riemann sum and decreases the upper Riemann sum. 


1.5 inequalities with Riemann sums 


Suppose f: [a,b] — R is a bounded function and P, P’ are partitions of [a,b] 
such that the list defining P is a sublist of the list defining P’. Then 


L(f,P, [a,b]) < L(f, P’, [a,b]) < U(f, P’, [a,b]) < U(f, P, [a, 9). 


Proof To prove the first inequality, suppose P is the partition xg,...,X, and P’ is the 
partition xg,...,X)y of [a,b]. For each j = 1,...,n, there exist k € {0,...,N —1} 
and a positive integer m such that xj-1 = X% < X44 <+°* <Xhy_ = Xj. We have 
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m 
< Gi — ei-1) _, inf 

i=1 ei el 
The inequality above implies that L(f, P, [a,b]) < L(f, P’, [a,b}). 

The middle inequality in this result follows from the observation that the infimum 
of each set of real numbers is less than or equal to the supremum of that set. 

The proof of the last inequality in this result is similar to the proof of the first 
inequality and is left to the reader. 


The following result states that if the function is fixed, then each lower Riemann 
sum is less than or equal to each upper Riemann sum. 


1.6 lower Riemann sums < upper Riemann sums 


Suppose f: [a,b] + R is a bounded function and P, P’ are partitions of [a, b}. 


Then 
L(f,P,|a,b]) < U(f, P’, [a,b]). 


Proof Let P” be the partition of [a,b] obtained by merging the lists that define P 
and P’. Then 


L(f, P, [a,b]) < L(f, P", (a, b]) 
< U(f, P", [a,b]) 
< U(f, P’, [a,b)), 


where all three inequalities above come from 1.5. 


We have been working with lower and upper Riemann sums. Now we define the 
lower and upper Riemann integrals. 


1.7 Definition lower and upper Riemann integrals 


Suppose f: [a,b] + R is a bounded function. The lower Riemann integral 
L(f, [a,b]) and the upper Riemann integral U(f, [a,b]) of f are defined by 


L(f, [a,b]) = ae L(f, P, [a,b]) 


and 


Uf, a,b]) = infU(f, P, [a,b] 


where the supremum and infimum above are taken over all partitions P of [a,b]. 
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In the definition above, we take the supremum (over all partitions) of the lower 
Riemann sums because adjoining more points to a partition increases the lower 
Riemann sum (by 1.5) and should provide a more accurate estimate of the area under 
the graph. Similarly, in the definition above, we take the infimum (over all partitions) 
of the upper Riemann sums because adjoining more points to a partition decreases 
the upper Riemann sum (by 1.5) and should provide a more accurate estimate of the 
area under the graph. 

Our first result about the lower and upper Riemann integrals is an easy inequality. 


1.8 lower Riemann integral < upper Riemann integral 


Suppose f: [a,b] + R is a bounded function. Then 


L(f, |a,b]) < Ulf, [a, 4). 


Proof The desired inequality follows from the definitions and 1.6. 


The lower Riemann integral and the upper Riemann integral can both be reasonably 
considered to be the area under the graph of a function. Which one should we use? 
The pictures in Example 1.4 suggest that these two quantities are the same for the 
function in that example; we will soon verify this suspicion. However, as we will see 
in the next section, there are functions for which the lower Riemann integral does not 
equal the upper Riemann integral. 

Instead of choosing between the lower Riemann integral and the upper Riemann 
integral, the standard procedure in Riemann integration is to consider only functions 
for which those two quantities are equal. This decision has the huge advantage of 
making the Riemann integral behave as we wish with respect to the sum of two 
functions (see Exercise 4 in this section). 


1.9 Definition Riemann integrable; Riemann integral 


e A bounded function on a closed bounded interval is called Riemann 
integrable if its lower Riemann integral equals its upper Riemann integral. 


b 
e If f: [a,b] + R is Riemann integrable, then the Riemann integral | fis 
defined by é 


i = L(f, [a,b]) = U(f, [a, 5). 


Let Z denote the set of integers and Z* denote the set of positive integers. 


1.10 Example computing a Riemann integral 
Define f: [0,1] + R by f(x) = x*. Then 


Qn2+3n+1 1 2n2 —3n+1 
U(f,[0,1]) < inf =l= 
(f | )) >= went 6n2 3 —: 6n2 
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where the two inequalities above come from Example 1.4 and the two equalities 
easily follow from dividing the numerators and denominators of both fractions above 
by n?. 

The paragraph above shows. that 
U(f,[0,1]) < $< L(f,(0,1]). When 
combined with 1.8, this shows that 
L(f,(0,1]) = U(f,[0,1]) = 3. Thus 


f is Riemann integrable and 


ik 


Now we come to a key result regarding Riemann integration. Uniform continuity 
provides the major tool that makes the proof work. 


Our definition of Riemann 
integration is actually a small 


modification of Riemann’s definition 
that was proposed by Gaston 
Darboux (1842-1917). 


1.11 continuous functions are Riemann integrable 


Every continuous real-valued function on each closed bounded interval is 
Riemann integrable. 


Proof Suppose a,b € R witha < band f: [a,b] — R is a continuous function 
(thus by a standard theorem from undergraduate real analysis, f is bounded and is 
uniformly continuous). Let ¢ > 0. Because f is uniformly continuous, there exists 
6 > O such that 


1.12 |f(s) — f(t)| < € for all s,t € [a,b] with |s —t| <0. 


Let n € Z* be such that ba <6. 
Let P be the equally spaced partition a = x9, X1,...,%n = b of {a,b] with 
b-—a 
n 


xj = Xj-1 = 
for each j = 1,...,n. Then 
U(f, [a,b]) — L(f, [a,b]) < U(f, P, |a,b]) — L(F,P, [a, b]) 
nn 
="=ty( sup f—_ inf f) 
jaa x) il 
< (b—a)e, 
where the first equality follows from the definitions of U(f, {a,b]) and L(f, [a,b]) 
and the last inequality follows from 1.12. 


We have shown that U(f, [a,b]) — L(f, [a,b]) < (b —a)e for all e > 0. Thus 
1.8 implies that L(f, [a,b]) = U(f, [a,b]). Hence f is Riemann integrable. 


An alternative notation for { fis f i f(x) dx. Here x is a dummy variable, so 
we could also write [ 4 f (t) df or use another variable. This notation becomes useful 


when we want to write something like h x? dx instead of using function notation. 
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The next result gives a frequently used estimate for a Riemann integral. 


1.13 bounds on Riemann integral 


Suppose f: [a,b] — R is Riemann integrable. Then 


dee ries < (b—a)su 
ps vet )sup f 


Proof Let P be the trivial partition a = x9,x; = b. Then 


(b—a) int f = L(f,P,[a,b]) <L(f,[a,b)) = ff 


proving the first inequality in the result. 
The second inequality in the result is proved similarly and is left to the reader. 


EXERCISES 1A 


1 Suppose f: [a,b] — R is a bounded function such that 
L(f, P, |a,b]) = U(f, P, [a,b}) 
for some partition P of [a,b]. Prove that f is a constant function on [a,b]. 
2 Supposea<s<t<b. Define f: [a,b] + R by 


fa) = {1 ifs <x<t, 


0 otherwise. 


Prove that f is Riemann integrable on [a,b] and that [ : f=t-—-s. 


3 Suppose f: [a,b] — R is a bounded function. Prove that f is Riemann inte- 
grable if and only if for each ¢ > 0, there exists a partition P of [a,b] such 
that 


U(f,P, [a,b]) —L(f,P, [a,b]) <« 


4 Suppose f,g: [a,b] —> R are Riemann integrable. Prove that f + g is Riemann 


integrable on [a,b] and 
futo=frefie 


5 Suppose f: [a,b] — R is Riemann integrable. Prove that the function —f is 
Riemann integrable on [a, b| and 


[fcn=-[r 
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Suppose f: [a,b] — R is Riemann integrable. Suppose g: [a,b] + Risa 
function such that g(x) = f(x) for all except finitely many x € [a,b]. Prove 
that g is Riemann integrable on {a, b] and 


b b 
jet 
a a 
Suppose f: [a,b] > R is a bounded function. For n € Zt, let P;, denote the 
partition that divides [a, b] into 2” intervals of equal size. Prove that 


L(f, [a,b]) = lim L(f, Py, [a,b]) and Uf, [a,b]) = tim Uf, Py, 2,b)). 


Suppose f : [a,b] + R is Riemann integrable. Prove that 


b 2! n : 
[f= tim "2 vat 9), 
a j=l 


n—-oo nN 


Suppose f: [a,b] — R is Riemann integrable. Prove that if c,d € R and 
a<c<d<b, then f is Riemann integrable on {c, d]. 

[To say that f is Riemann integrable on |c,d| means that f with its domain 
restricted to |c,d] is Riemann integrable. | 


Suppose f: [a,b] — R is a bounded function and c € (a,b). Prove that f is 
Riemann integrable on |a, b] if and only if f is Riemann integrable on [a,c] and 
f is Riemann integrable on [c, b]. Furthermore, prove that if these conditions 


hold, then ‘ ; 
pie 


Suppose f: [a,b] + R is Riemann integrable. Define F: [a,b] > R by 
0 ift=a, 
fl eas ou Le 
| f ifte (a,b). 
a 
Prove that F is continuous on [a,b]. 


Suppose f: [a,b] — R is Riemann integrable. Prove that |f| is Riemann 


integrable and that 
b b 
als [re 
a a 


Suppose f: [a,b] — R is an increasing function, meaning that c,d € [a,b] with 
c < dimplies f(c) < f(d). Prove that f is Riemann integrable on [a,b]. 


Suppose f;, f2,... is a sequence of Riemann integrable functions on [a,b] such 
that fi, fo,... converges uniformly on {a,b] to a function f: [a,b] > R. Prove 
that f is Riemann integrable and 


[fein [he 
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1B Riemann Integral Is Not Good Enough 


The Riemann integral works well enough to be taught to millions of calculus students 
around the world each year. However, the Riemann integral has several deficiencies. 
In this section, we discuss the following three issues: 


e Riemann integration does not handle functions with many discontinuities; 
e Riemann integration does not handle unbounded functions; 
e Riemann integration does not work well with limits. 


In Chapter 2, we will start to construct a theory to remedy these problems. 
We begin with the following example of a function that is not Riemann integrable. 


1.14 Example a function that is not Riemann integrable 
Define f: [0,1] + R by 


1 if x is rational, 
f(x) = 


0 if x is irrational. 
If [a,b] C [0,1] with a < b, then 


inf f =O and supf =1 

[a,b] [a,b] 
because [a,b] contains an irrational number and contains a rational number. Thus 
L(f,P,[0,1]) = 0 and U(f,P, [0,1]) = 1 for every partition P of [0,1]. Hence 
L(f,{0,1]) = 0 and U(f, [0,1]) = 1. Because L(f, [0,1]) 4 U(f,[0,1]), we 
conclude that f is not Riemann integrable. 

This example is disturbing because (as we will see later), there are far fewer 

rational numbers than irrational numbers. Thus f should, in some sense, have 
integral 0. However, the Riemann integral of f is not defined. 


Trying to apply the definition of the Riemann integral to unbounded functions 
would lead to undesirable results, as shown in the next example. 


1.15 Example Riemann integration does not work with unbounded functions 


Define f: [0,1] + R by 


Vx 


as Oe s = 1, 
0 ifx =—0. 


If xo,X1,..-,Xp is a partition of [0,1], then sup f = oo. Thus if we tried to apply 
[xo, x1] 
the definition of the upper Riemann sum to f, we would have U(f, P, [0,1]) = oo 
for every partition P of [0,1]. 
However, we should consider the area under the graph of f to be 2, not co, because 


1 
lim | f =lim(2—2Va) =2. 
a a0 


a0 
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Calculus courses deal with the previous example by defining i a dx to be 


limy\o J : a dx. If using this approach and 


1 1 


f(x) 
then we would define fo f tobe 

, 1/2 , b 

im + lim : 

a0 [ f bt1 J1/2 f 


However, the idea of taking Riemann integrals over subdomains and then taking 
limits can fail with more complicated functions, as shown in the next example. 


1.16 Example area seems to make sense, but Riemann integral is not defined 


Let r1,72,... be a sequence that includes each rational number in (0, 1) exactly 
once and that includes no other numbers. For k € Z*, define f,: [0,1] > R by 


== ifx > rx, 


fix) = 4" 
0 ifx <7rz. 


Define f: [0,1] — [0,00] by 


Because every nonempty open subinterval of [0,1] contains a rational number, the 
function f is unbounded on every such subinterval. Thus the Riemann integral of f 
is undefined on every subinterval of [0,1] with more than one element. 

However, the area under the graph of each f; is less than 2. The formula defining 
f then shows that we should expect the area under the graph of f to be less than 2 
rather than undefined. 


The next example shows that the pointwise limit of a sequence of Riemann 
integrable functions bounded by 1 need not be Riemann integrable. 
1.17 Example Riemann integration does not work well with pointwise limits 


Let r1,12,... be a sequence that includes each rational number in (0, 1] exactly 
once and that includes no other numbers. For k € Z*, define f,: [0,1] > R by 


1 ifxe pace Tops 
fx(x) = ane 


0 otherwise. 


Then each fj, is Riemann integrable and J ( fk = 9, as you should verify. 
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Define f: [0,1] + R by 


1 if x is rational, 
f(x) = 


0 if x is irrational. 


Clearly 
jim fx(x) = f(x) foreach x € [0,1]. 
— 00 
However, f is not Riemann integrable (see Example 1.14) even though f is the 
pointwise limit of a sequence of integrable functions bounded by 1. 


Because analysis relies heavily upon limits, a good theory of integration should 
allow for interchange of limits and integrals, at least when the functions are appropri- 
ately bounded. Thus the previous example points out a serious deficiency in Riemann 
integration. 

Now we come to a positive result, but as we will see, even this result indicates that 
Riemann integration has some problems. 


1.18 interchanging Riemann integral and limit 


Suppose a,b, M € R witha < b. Suppose f;, fo,... is a sequence of Riemann 
integrable functions on [a,b] such that 


Lfk(x)| <M 


for allk € Z* and all x € [a,b]. Suppose lim, fy (x) exists for each 


x € [a,b]. Define f: [a,b] + R by 
fle) = Jim f(x). 


If f is Riemann integrable on [a, b], then 


[f= jim [he 


The result above suffers from two problems. The first problem is the undesirable 
hypothesis that the limit function f is Riemann integrable. Ideally, that property 
would follow from the other hypotheses, but Example 1.17 shows that this need not 
be true. 

The second problem with the result 
above is that its proof seems to be more 
intricate than the proofs of other results 
involving Riemann integration. We do not 
give a proof here of the result above. A 
clean proof of a stronger result is given in 
Chapter 3, using the tools of measure theory that we develop starting with the next 
chapter. 


The difficulty in finding a simple 
Riemann-integration-based proof of 


the result above suggests that 
Riemann integration is not the ideal 
theory of integration. 
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EXERCISES 1B 


1 Define f: [0,1] - Ras follows: 


if a is irrational, 


f(a) = 


2 © 


if a is rational and n is the smallest positive integer 
such that a = ™ for some integer m. 


1 
Show that f is Riemann integrable and compute | f. 
0 


2 Suppose f: [a,b] — R is a bounded function. Prove that f is Riemann inte- 
grable if and only if 


L(—f, [a,b]) = —L(f, |a, 6). 
3 Suppose f,¢: [a,b] — R are bounded functions. Prove that 


L(f, [a,b]) + L(g, |a,b]) < Lf +8, [a,b)) 


and 


U(f + 8, |a,b]) < U(f, [a,b]) + U(g, [a, 6). 


4 Give an example of bounded functions f, g: [0,1] — R such that 


L(f, {0,1]) + L(g, {0,1]) < Lf + 8, 10,1) 


and 


U(f +8, |0,1]) < U(F, [0,1]) + Ug, (0,1). 


5 Give an example of a sequence of continuous real-valued functions 1, f2,... 
on [0,1] and a continuous real-valued function f on [0,1] such that 


f(x) = lim fx (x) 
k-c0 
for each x € [0,1] but 


[ore jim [he 
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Chapter 2 ome 


The last section of the previous chapter discusses several deficiencies of Riemann 
integration. To remedy those deficiencies, in this chapter we extend the notion of the 
length of an interval to a larger collection of subsets of R. This leads us to measures 
and then in the next chapter to integration with respect to measures. 

We begin this chapter by investigating outer measure, which looks promising but 
fails to have a crucial property. That failure leads us to c-algebras and measurable 
spaces. Then we define measures in an abstract context that can be applied to settings 
more general than R. Next, we construct Lebesgue measure on R as our desired 
extension of the notion of the length of an interval. 


Fifth-century AD Rene ceiling mosaic in what is now a UNESCO World Heritage 
site in Ravenna, Italy. Giuseppe Vitali, who in 1905 proved result in this chapter, 
was born and grew up in Ravenna, where perhaps he saw this mosaic. Could the 
memory of the translation-invariant feature of this mosaic have suggested to Vitali 
the translation invariance that is the heart of his proof of ? 
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2A Outer Measure on R 


Motivation and Definition of Outer Measure 


The Riemann integral arises from approximating the area under the graph of a function 
by sums of the areas of approximating rectangles. These rectangles have heights that 
approximate the values of the function on subintervals of the function’s domain. The 
width of each approximating rectangle is the length of the corresponding subinterval. 
This length is the term x; — x;_1 in the definitions of the lower and upper Riemann 
sums (see 1.3). 

To extend integration to a larger class of functions than the Riemann integrable 
functions, we will write the domain of a function as the union of subsets more 
complicated than the subintervals used in Riemann integration. We will need to 
assign a size to each of those subsets, where the size is an extension of the length of 
intervals. 

For example, we expect the size of the set (1,3) U (7,10) to be 5 (because the 
first interval has length 2, the second interval has length 3, and 2 + 3 = 5). 

Assigning a size to subsets of R that are more complicated than unions of open 
intervals becomes a nontrivial task. This chapter focuses on that task and its extension 
to other contexts. In the next chapter, we will see how to use the ideas developed in 
this chapter to create a rich theory of integration. 

We begin by giving the expected definition of the length of an open interval, along 
with a notation for that length. 


2.1 Definition length of open interval; ¢(1 


The length (I) of an open interval I is defined by 


if I = (a,b) for some a,b € R witha < b, 


ile — 2), 
if I = (—ov,a) or I = (a,00) for somea € R, 
if I = (—00,00). 


Suppose A C R. The size of A should be at most the sum of the lengths of a 
sequence of open intervals whose union contains A. Taking the infimum of all such 
sums gives a reasonable definition of the size of A, denoted | A| and called the outer 
measure of A. 


2.2 Definition outer measure; | A| 


The outer measure |A| of a set A C R is defined by 


foo} co 


A int{ )> (Uy) : ly, Iz, .. are open intervals such that A C | J 1}. 
k=1 = 
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The definition of outer measure involves an infinite sum. The infinite sum )°?°_, ty 
of a sequence t,,to,... of elements of [0, co] is defined to be oo if some th = ©. 
Otherwise, >, t, is defined to be the limit (possibly oo) of the increasing sequence 
ty,ty + t,t, + to +1t3,... of partial sums; thus 


co n 
dk => ey te 


2.3 Example finite sets have outer measure 0 


Suppose A = {a1,...,ay} is a finite set of real numbers. Suppose ¢ > 0. Define 
a sequence Ij, In,... of open intervals by 


i= (a, —€,a, +e) ifk <n, 
‘|e ifk >n. 


Then Ij, Ip,... 1s a sequence of open intervals whose union contains A. Clearly 
Vey CU) = 2en. Hence |A| < 2en. Because ¢ is an arbitrary positive number, this 
implies that |A| = 0. 


Good Properties of Outer Measure 


Outer measure has several nice properties that are discussed in this subsection. We 
begin with a result that improves upon the example above. 


2.4 countable sets have outer measure 0 


Every countable subset of R has outer measure 0. 


Proof Suppose A = {a1,42,...} is acountable subset of R. Let e > 0. Fork € Z*, 
let 


E E 
I — (a - 5K Ak t =r): 


Then I1, Iz,... is a sequence of open intervals whose union contains A. Because 
[oe) 
a l(Ik) = 2e, 
k=1 


we have |A| < 2e. Because ¢ is an arbitrary positive number, this implies that 
|A| =0. 


The result above, along with the result that the set Q of rational numbers is 
countable, implies that Q has outer measure 0. We will soon show that there are far 
fewer rational numbers than real numbers (see 2.17). Thus the equation |Q| = 0 
indicates that outer measure has a good property that we want any reasonable notion 
of size to possess. 
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The next result shows that outer measure does the right thing with respect to set 
inclusion. 


2.5 outer measure preserves order 


Suppose A and B are subsets of R with A C B. Then |A| < |B]. 


Proof Suppose I;, Ip,... is a sequence of open intervals whose union contains B. 
Then the union of this sequence of open intervals also contains A. Hence 


[oe) 


|A] < Yo ek). 


k=1 
Taking the infimum over all sequences of open intervals whose union contains B, we 


have |A| < |B}. 


We expect that the size of a subset of R should not change if the set is shifted to 
the right or to the left. The next definition allows us to be more precise. 


26 Definition translation; t+ A 


If t € Rand A C R, then the translation t + A is defined by 


t+A={t+a:aeEA}. 


If t > 0, then t + A is obtained by moving the set A to the right f units on the real 
line; if f < 0, then t + A is obtained by moving the set A to the left |t| units. 

Translation does not change the length of an open interval. Specifically, if t € R 
and a,b € [—oo,oo], then t+ (a,b) = (t+a,t+b) and thus ¢(t + (a,b)) = 
e((a, b)). Here we are using the standard convention that t + (—0oo) = —oo and 
t+co=o. 

The next result states that translation invariance carries over to outer measure. 


Proof Suppose I1, Iz,...is a sequence of open intervals whose union contains A. 
Then t+ I1,¢+ In,... is a sequence of open intervals whose union contains ¢t + A. 
Thus 


t+ Al < Vo e(t+h) = Vo Cy). 
k=1 k=1 
Taking the infimum of the last term over all sequences I, In,... of open intervals 
whose union contains A, we have |t + A| < |A|. 
To get the inequality in the other direction, note that A = —f + (f+ A). Thus 
applying the inequality from the previous paragraph, with A replaced by t + A and t 
replaced by —t, we have |A| = |—t+ (t+ A)| < |f+ A]. Hence |f+ A| = |AJ. 
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The union of the intervals (1,4) and (3,5) is the interval (1,5). Thus 
£((1,4)U (3,5)) < £(0,4)) +2((,5)) 


because the left side of the inequality above equals 4 and the right side equals 5. The 
direction of the inequality above is explained by noting that the interval (3,4), which 
is the intersection of (1,4) and (3,5), has its length counted twice on the right side 
of the inequality above. 

The example of the paragraph above should provide intuition for the direction of 
the inequality in the next result. The property of satisfying the inequality in the result 
below is called countable subadditivity because it applies to sequences of subsets. 


2.8 countable subadditivity of outer measure 


Suppose Ay, A,... is a sequence of subsets of R. Then 


foe} fo} 
|U Ael < Solel: 
il k=1 


Proof If |A,| = co for some k € Z*, then the inequality above clearly holds. Thus 
assume |A;| < co forall k € Z*. 

Let € > 0. For eachk € Zt, let I, x, In ¢,-.. be a sequence of open intervals 
whose union contains A; such that 


€ 
ya CL ik) = oe eee e 
Thus 


2.9 yy ik) <e+ FiAd 


k=1j=1 


The doubly indexed collection of open intervals {Thx : j,k € Z*} can be rearranged 
into a sequence of open intervals whose union contains U_, Ax as follows, where 
in step k (start with k = 2, then k = 3,4,5,...) we adjoin the k — 1 intervals whose 
indices add up to k: 


Thi, th, lo1, h3,122, 131, bya, 1n3, 132, laa, 15, 12,4, 133, 142, [5 1,--- - 
et Ne a = 
2 3 4 5 sum of the two indices is 6 


Inequality 2.9 shows that the sum of the lengths of the intervals listed above is less 
than or equal toe + _,|Ag|. Thus [U4 Ax| <e+P2,|Ar|. Because e is an 
arbitrary positive number, this implies that Ue Ax| < Vey |Axl- 


Countable subadditivity implies finite subadditivity, meaning that 
|A, U---UAp| < |Ay] +---+|An| 


for all Ay,..., Ay, C R, because we can take Ay, = © fork > n in 2.8. 
The countable subadditivity of outer measure, as proved above, adds to our list of 
nice properties enjoyed by outer measure. 
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Outer Measure of Closed Bounded Interval 


One more good property of outer measure that we should prove is that if a < b, 
then the outer measure of the closed interval [a, b] is b — a. Indeed, if ¢ > 0, then 
(a —e,b + €),©,@,... is a sequence of open intervals whose union contains [a,b]. 
Thus |[a,b]| < b — a+ 2e. Because this inequality holds for all e > 0, we conclude 
that 

\[a,b]| <b—a. 


Is the inequality in the other direction obviously true to you? If so, think again, 
because a proof of the inequality in the other direction requires that the completeness 
of R is used in some form. For example, suppose R was a countable set (which is not 
true, as we will soon see, but the uncountability of R is not obvious). Then we would 
have |(a,b]| = 0 (by 2.4). Thus something deeper than you might suspect is going 
on with the ingredients needed to prove that |[a,b]| > b — a. 

The following definition will be useful when we prove that | a, b]| > b — a. 


Suppose A Cc R. 


e A collection C of open subsets of R is called an open cover of A if A is 
contained in the union of all the sets in C. 


e An open cover C of A is said to have a finite subcover if A is contained in 
the union of some finite list of sets in C. 


2.11 Example open covers and finite subcovers 


e The collection {(k,k +2) : k € Z*} is an open cover of [2,5] because 
[2,5] C U1 (kk +2). This open cover has a finite subcover because [2,5] C 
(1,3) U (2,4) U (3,5) U (4,6). 


e The collection {(k,k +2) : k € Z*} is an open cover of [2,00) because 
[2,00] C URy(k,k +2). This open cover does not have a finite subcover 
because there do not exist finitely many sets of the form (k,k + 2) whose union 
contains [2, 00). 


e The collection {(0,2 — 3) : k € Z*} is an open cover of (1,2) because 
(12)°C. Uet.2— i): This open cover does not have a finite subcover 


because there do not exist finitely many sets of the form (0,2 — a) whose union 
contains (1,2). 


The next result will be our major tool in the proof that |[a,b]| > b — a. Although 
we need only the result as stated, be sure to see Exercise 4 in this section, which 
when combined with the next result gives a characterization of the closed bounded 
subsets of R. Note that the following proof uses the completeness property of the real 
numbers (by asserting that the supremum of a certain nonempty bounded set exists). 


Section 2A Outer Measure on R 19 


2.12 Heine-Borel Theorem 


Every open cover of a closed bounded subset of R has a finite subcover. 


Proof Suppose F is a closed bounded 
subset of R and C is an open cover of F. 

First consider the case where F = 
[a, b| for some a,b € R witha < b. Thus 
C is an open cover of [a,b]. Let 


D = {d € [a,b] : [a,d] has a finite subcover from C}. 


To provide visual clues, we usually 


denote closed sets by F and open 
sets by G. 


Note that a € D (because a € G for some G € C). Thus D is not the empty set. Let 
s=supD. 


Thus s € [a,b]. Hence there exists an open set G € C such that s € G. Let d > 0 
be such that (s — d,s +5) C G. Because s = sup D, there exist d € (s — d,s] and 
n € Zt and Gy,...,Gy € C such that 


[a,d] C GyU---UGy. 
Now 
213 [a,d’] CGUG,U---UGy 


for all d’ € [s,s + 6).Thus d’ € D for alld’ € [s,s +5) M [a,b]. This implies that 
s = b. Furthermore, 2.13 with d’ = b shows that [a, b] has a finite subcover from C, 
completing the proof in the case where F = [a,b]. 

Now suppose F is an arbitrary closed bounded subset of R and that C is an open 
cover of F. Let a,b € R be such that F C [a,b]. Now C U {R \ F} is an open cover 
of R and hence is an open cover of [a,b] (here R \ F denotes the set complement of 
F in R). By our first case, there exist G1,..., Gy, © C such that 


[a,b] C G,U-+-UG, U (R \ F). 


Thus 
FCG,U---UG,, 


completing the proof. 


Saint-Affrique, the small 
town in southern France 
where Emile Borel 
(1871-1956) was born. 
Borel first stated and 
proved what we call the 
Heine—Borel Theorem in 
1895. Earlier, Eduard 
Heine (1821-1881) and 
others had used similar 
results. 
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Now we can prove that closed intervals have the expected outer measure. 


2.14 outer measure of a closed interval 


Suppose a,b € R, with a < b. Then |[a,b]| = b —a. 


Proof See the first paragraph of this subsection for the proof that |[a,b]| <b —a. 

To prove the inequality in the other direction, suppose Ij, Ip,... is a sequence of 
open intervals such that [a,b] C U2, Ik. By the Heine-Borel Theorem (2.12), there 
exists 1 € Z* such that 


2.15 [a,b] CILU---UIn. 


We will now prove by induction on n that the inclusion above implies that 


2.16 


Mes 


(I) = b-a. 


> 
ll 


1 
This will then imply that Po, €(I,) > LZ, €U) > b — a, completing the proof 
that |[a,b]| > b—a. 

To get started with our induction, note that 2.15 clearly implies 2.16 if m = 1. 
Now for the induction step: Suppose 1 > 1 and 2.15 implies 2.16 for all choices of 
a,b € Rwitha < b. Suppose [,,...,1,, 1,41 are open intervals such that 


— 


[a,b] CU++-UI_U Int 


Thus b is in at least one of the intervals y,...,In, 1,41. By relabeling, we can 
assume that b € I,41. Suppose In41 = (c,d). Ifc < a, then (1,41) > b—aand 
there is nothing further to prove; thus we can assume that a < c < b < d, as shown 
in the figure below. 


Tn 
rc 
—. ——_— A] KS tH 
a c b d 


Hence Alice was beginning to get very tired 


[a,c] CI, U-+-UIy. of sitting by her sister on the bank, 
By our induction hypothesis, we have ana Gy hauuig moshing <0 eG: nee 
yr, &() >c—a. Thus twice she had peeped into the book 
k=1 = . 


her sister was reading, but it had no 
n+1 


pictures or conversations in it, “and 


du e(Ik) 2 (¢— 4) + &n+1) what is the use of a book,” thought 

- ices 0) Alice, ee or 
conversation: 

=d-a — opening paragraph of Alice’s 
Adventures in Wonderland, by Lewis 


2 b—a, Carroll 


completing the proof. 


The result above easily implies that the outer measure of each open interval equals 
its length (see Exercise 6). 
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The previous result has the following important corollary. You may be familiar 
with Georg Cantor’s (1845-1918) original proof of the next result. The proof using 
outer measure that is presented here gives an interesting alternative to Cantor’s proof. 


2.17. nontrivial intervals are uncountable 


Every interval in R that contains at least two distinct elements is uncountable. 


Proof Suppose I is an interval that contains a,b € R with a < b. Then 
I1| > |[a,6]| =b-a>0, 


where the first inequality above holds because outer measure preserves order (see 2.5) 
and the equality above comes from 2.14. Because every countable subset of R has 
outer measure 0 (see 2.4), we can conclude that I is uncountable. 


Outer Measure is Not Additive 


We have had several results giving nice 
properties of outer measure. Now we 
come to an unpleasant property of outer 
measure. 

If outer measure were a perfect way to 
assign a Size as an extension of the lengths 
of intervals, then the outer measure of the 
union of two disjoint sets would equal the 
sum of the outer measures of the two sets. Sadly, the next result states that outer 
measure does not have this property. 

In the next section, we begin the process of getting around the next result, which 
will lead us to measure theory. 


2.18 nonadditivity of outer measure 


There exist disjoint subsets A and B of R such that 


Outer measure led to the proof 
above that R is uncountable. This 
application of outer measure to 
prove a result that seems 
unconnected with outer measure is 
an indication that outer measure has 
serious mathematical value. 


|AUB| £|A|+ |B]. 


Proof Fora € {[—1,1], let @ be the set of numbers in [—1, 1] that differ from a by a 
rational number. In other words, 


a= {ce [-1,l]):a—ce Q}. 


If a,b € [-1,1] andanb # @, then 
a = b. (Proof: Suppose there exists d € 
a b. Then a — d and b — d are rational 
numbers; subtracting, we conclude that 
a — bis arational number. The equation 
a—c = (a—b) + (b —c) now implies that if c € [—1,1], then a — c is a rational 
number if and only if b — c is a rational number. In other words, a = b.) 


Think of & as the equivalence class 
of a under the equivalence relation 


that declares a,c € [-1, 1] to be 
equivalent if a—c € Q. 
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Clearly a € @ for eacha € [—1,1]. Thus [-1,1J]= (J @. 
ae[—1,1] 
Let V be a set that contains exactly one 
element in each of the distinct sets in 


This step involves the Axiom of 
Choice, as discussed after this proof. 


{a:a€ [-1,1]}. The set V arises by choosing one 
element from each equivalence 
In other words, for every a € [—-1, 1], the \ class. 
set VM di has exactly one element. 
Let 11,172,... be a sequence of distinct rational numbers such that 


[—2,2] NQ = {r1,12,...}. 


Then 


C 8 


Pb] eae), 


k=1 


where the set inclusion above holds because if a € [—1,1], then letting v be the 
unique element of V Ma, we have a — v € Q, which implies that a = 7, +0 € 
1, + V for some k € Zt. 

The set inclusion above, the order-preserving property of outer measure (2.5), and 
the countable subadditivity of outer measure (2.8) imply 


[oe 


I-11] < Lobe +VI.- 
k=1 
We know that |[—1,1]| = 2 (from 2.14). The translation invariance of outer measure 


(2.7) thus allows us to rewrite the inequality above as 


2<S IVI. 


Thus |V| > 0. 

Note that the sets r7 + V,r2 + V,... are disjoint. (Proof: Suppose there exists 
BE (47 +V) (mK +V). Then t = 7; +01 = 1p + 02 for some 01,02 € V, which 
implies that vy; — v2 = r, — rj € Q. Our construction of V now implies that v1 = v2, 
which implies that rj = r,, which implies that j = k.) 

Let n € Z*. Clearly 


U (4 +V) Cc [-3,3] 


because V C [—1,1] and each r, € [—2,2]. The set inclusion above implies that 


n 


Ulrr+v)| <6. 
k=1 


2.19 


However 


n n 
2.20 Yiln+Vl = Vl =n VI. 
k=1 k=1 
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Now 2.19 and 2.20 suggest that we choose n € Z* such that n|V| > 6. Thus 
n n 
Para JUG +V)| < YlnetVI. 
k=1 k=1 


If we had |AU B| = l4| + |B| for all disjoint subsets A, B of R, then by induction 


on 1 we would have | U Ail = y |A;| for all disjoint subsets Aj,..., A, of R. 
k=1 k=1 

However, 2.21 tells us that no such result holds. Thus there exist disjoint subsets 

A,B of R such that |A U B] ¢ |A| + |B}. 


The Axiom of Choice, which belongs to set theory, states that if € is a set whose 
elements are disjoint nonempty sets, then there exists a set D that contains exactly one 
element in each set that is an element of €. We used the Axiom of Choice to construct 
the set D that was used in the last proof. 

A small minority of mathematicians objects to the use of the Axiom of Choice. 
Thus we will keep track of where we need to use it. Even if you do not like to use the 
Axiom of Choice, the previous result warns us away from trying to prove that outer 
measure is additive (any such proof would need to contradict the Axiom of Choice, 
which is consistent with the standard axioms of set theory). 


EXERCISES 2A 


1 Prove that if A and B are subsets of R and |B| = 0, then |A U B| = |A|. 


2 Suppose A C Randt € R. LettA = {ta: a € A}. Prove that |tA| = |t| |A]. 
[Assume that 0 - 09 is defined to be 0.] 


3 Prove that if A,B C Rand |A| < ov, then |B \ A| > |B] —|A|. 


4 Suppose F is a subset of R with the property that every open cover of F has a 
finite subcover. Prove that F is closed and bounded. 


5 Suppose A is a set of closed subsets of R such that (\p¢_4 F = ©. Prove that if A 
contains at least one bounded set, then there exist 1 € Zt and F,,...,F, € A 
such that F)} 1---OF, = @. 


6 Prove that if a,b € Randa < J, then 
|(4,b)| = |[a,b)| = |(4,b]| = b—a. 
7 Suppose a,b,c,d are real numbers with a < b andc < d. Prove that 
|(a,b) U (c,d)| = (b — a) + (d —c) if and only if (a,b) (c,d) = @. 
8 Prove thatif A C Randt > 0,then|A| = |AN(—#,t)| +|ANM(R\ (-t,#))|- 
9 Prove that |A| = lim |A M (—t,t)| for all A CR. 


12 


13 


14 
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Prove that |{0,1] \ Q| = 1. 


Prove that if I, In,... is a disjoint sequence of open intervals, then 


[Ute] = 35 eC. 
k=1 


k=1 


Suppose r1,72,... iS a sequence that contains every rational number. Let 


(a) 
(b) 


(c) 


F=R\U (ra~ gett + 36) 


Show that F is a closed subset of R. 


Prove that if I is an interval contained in F, then I contains at most one 
element. 


Prove that |F| = oo. 


Suppose € > 0. Prove that there exists a subset F of [0,1] such that F is closed, 
every element of F is an irrational number, and |F| > 1 —e. 


Consider the following figure, which is drawn accurately to scale. 


(a) 


2 5 8 11 14 17 20 


Show that the right triangle whose vertices are (0,0), (20,0), and (20,9) 
has area 90. 

[We have not defined area yet, but just use the elementary formulas for the 
areas of triangles and rectangles that you learned long ago.] 

Show that the yellow (lower) right triangle has area 27.5. 

Show that the red rectangle has area 45. 

Show that the blue (upper) right triangle has area 18. 

Add the results of parts (b), (c), and (d), showing that the area of the colored 
region is 90.5. 

Seeing the figure above, most people expect parts (a) and (e) to have the 
same result. Yet in part (a) we found area 90, and in part (e) we found area 
90.5. Explain why these results differ. 

[You may be tempted to think that what we have here is a two-dimensional 
example similar to the result about the nonadditivity of outer measure 
(2.18). However, genuine examples of nonadditivity require much more 
complicated sets than in this example. | 
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2B Measurable Spaces and Functions 


The last result in the previous section showed that outer measure is not additive. 
Could this disappointing result perhaps be fixed by using some notion other than 
outer measure for the size of a subset of R? The next result answers this question by 
showing that there does not exist a notion of size, called the Greek letter mu (j/) in 
the result below, that has all the desirable properties. 

Property (c) in the result below is called countable additivity. Countable additivity 
is a highly desirable property because we want to be able to prove theorems about 
limits (the heart of analysis!), which requires countable additivity. 


2.22 nonexistence of extension of length to all subsets of R 


There does not exist a function j¢ with all the following properties: 


(a) jis a function from the set of subsets of R to (0, oo]. 


(b) p(I) = (Z) for every open interval I of R. 


(c) ( W) Ax) = )° p(Ax) for every disjoint sequence Ai, Az,... of subsets 
kl k=1 
of R. 


(d) w(t+ A) = u(A) for every A C Randeveryt € R. 


Proof Suppose there exists a function p/ 
with all the properties listed in the state- 
ment of this result. 

Observe that 1(©) = 0, as follows 
from (b) because the empty set is an open interval with length 0. 

If AC BCR, then (A) < p(B), as follows from (c) because we can write B 
as the union of the disjoint sequence A, B \ A,@,@,...; thus 


We will show that yt has all the 
properties of outer measure that 


were used in the proof of 2.18. 


w(B) = w(A)+u(B\ A) +0+0+---=pu(A)+p(B\ A) > uA). 


If a,b € Rwitha < b, then (a,b) C [a,b] C (a—e€,b +€) for every € > 0. 
Thus b —a < p({a,b]) < b—a + 2e for every € > 0. Hence p({a,b]) = b —a. 

If A;, Az,...is a sequence of subsets of R, then Ay, Az \ Ai, A3 \ (Ai, UAz2),... 
is a disjoint sequence of subsets of R whose union is U2, Ax. Thus 


n(U At) = #(Ar U (Ap \ Ar) U (Aa \ (Ar U Ag) U + ) 
k=1 

> u(A1) + p(Az \ Ar) + u(A3 \ (Ay U A2)) peer 

Ss ¥ H(AR), 

k=] 


where the second equality follows from the countable additivity of py. 
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We have shown that py has all the properties of outer measure that were used 
in the proof of 2.18. Repeating the proof of 2.18, we see that there exist disjoint 
subsets A, B of R such that y(A UB) ¢ (A) + p(B). Thus the disjoint sequence 
A, B,@,@,... does not satisfy the countable additivity property required by (c). This 
contradiction completes the proof. 


o-Algebras 


The last result shows that we need to give up one of the desirable properties in our 
goal of extending the notion of size from intervals to more general subsets of R. We 
cannot give up 2.22(b) because the size of an interval needs to be its length. We 
cannot give up 2.22(c) because countable additivity is needed to prove theorems 
about limits. We cannot give up 2.22(d) because a size that is not translation invariant 
does not satisfy our intuitive notion of size as a generalization of length. 

Thus we are forced to relax the requirement in 2.22(a) that the size is defined for 
all subsets of R. Experience shows that to have a viable theory that allows for taking 
limits, the collection of subsets for which the size is defined should be closed under 
complementation and closed under countable unions. Thus we make the following 
definition. 


2.23 Definition c-algebra 


Suppose X is a set and S is a set of subsets of X. Then S is called a o-algebra 
on X if the following three conditions are satisfied: 


eOeEeS; 


e ifE€S,thenX\E€S; 


foe) 
e if E1,E,,... is a sequence of elements of S, then UJ E, ES. 
k=1 


Make sure you verify that the examples in all three bullet points below are indeed 
g-algebras. The verification is obvious for the first two bullet points. For the third 
bullet point, you need to use the result that the countable union of countable sets 
is countable (see the proof of 2.8 for an example of how a doubly indexed list can 
be converted to a singly indexed sequence). The exercises contain some additional 
examples of a-algebras. 


2.24 Example o-algebras 


e Suppose X is a set. Then clearly {D, X} is a c-algebra on X. 
e Suppose X is a set. Then clearly the set of all subsets of X is a g-algebra on X. 


e Suppose X is a set. Then the set of all subsets E of X such that E is countable or 
X \ E is countable is a v-algebra on X. 
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Now we come to some easy but important properties of 7-algebras. 


2.25 o-algebras are closed under countable intersection 
Suppose S is a c-algebra on a set X. Then 


(a) X ES; 


(b) ifD,E € S,then DUE € Sand DNE€ SandD\E€S; 


[oe] 
(c) if Ey, E2,... is a sequence of elements of S, then () E, ES. 
k=] 


Proof Because © € S and X = X \ @, the first two bullet points in the definition 
of o-algebra (2.23) imply that X € S, proving (a). 

Suppose D, E € S. Then D U E is the union of the sequence D, E,©,@,... of 
elements of S. Thus the third bullet point in the definition of -algebra (2.23) implies 
that DUE € S. 

De Morgan’s Laws tell us that 


X\ (DNE) = (X\D)U(X\ EB). 


If D,E € S, then the right side of the equation above is in S; hence X \ (DME) € S; 
thus the complement in X of X \ (DM E) is in S; in other words, DME € S. 
Because D\\E = DM (X \ E), we see that if D,E € S, then D\E € S, 
completing the proof of (b). 
Finally, suppose £1, Ez,... is a sequence of elements of S. De Morgan’s Laws 
tell us that 


foe) 


x\ q Ee I Be). 


k=1 k=1 


The right side of the equation above is in S. Hence the left side is in S, which implies 
that X \ (X \ Ng Ex) € S. In other words, 2, Ex € S, proving (c). 


The word measurable is used in the terminology below because in the next section 
we introduce a size function, called a measure, defined on measurable sets. 


2.26 Definition measurable space; measurable set 


e A measurable space is an ordered pair (X,S), where X is a set and S isa 


g-algebra on X. 


e Anelement of S is called an S-measurable set, or just a measurable set if S 
is clear from the context. 


For example, if X = R and S is the set of all subsets of R that are countable or 
have a countable complement, then the set of rational numbers is S-measurable but 
the set of positive real numbers is not S-measurable. 
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Borel Subsets of R 


The next result guarantees that there is a smallest c-algebra on a set X containing a 
given set A of subsets of X. 


2.27 smallest o-algebra containing a collection of subsets 


Suppose X is a set and A is a set of subsets of X. Then the intersection of all 
g-algebras on X that contain A is a v-algebra on X. 


Proof There is at least one g-algebra on X that contains A because the o-algebra 
consisting of all subsets of X contains A. 

Let S be the intersection of all 7-algebras on X that contain A. Then ©@ € S 
because © is an element of each c-algebra on X that contains A. 

Suppose E € S. Thus E is in every o-algebra on X that contains A. Thus X \ E 
is in every v-algebra on X that contains A. Hence X\ E € S. 

Suppose Ej, Ep,... is a sequence of elements of S. Thus each E; is in every o- 
algebra on X that contains A. Thus U7, E; is in every o-algebra on X that contains 
A. Hence U2, Ex € S, which completes the proof that S is a v-algebra on X. 


Using the terminology smallest for the intersection of all o-algebras that contain 
a set A of subsets of X makes sense because the intersection of those o-algebras is 
contained in every o-algebra that contains A. 


2.28 Example smallest c-algebra 


e Suppose X is a set and A is the set of subsets of X that consist of exactly one 
element: 


A= {{x}: xe X}. 


Then the smallest c-algebra on X containing A is the set of all subsets E of X 
such that E is countable or X \ E is countable, as you should verify. 


Suppose A = {(0,1),(0,00)}. Then the smallest c-algebra on R containing 
Ais {@, (0,1), (0,00), (—00,0] U [1, 60), (—e0, 0], [1, 00), (—00, 1), R}, as you 
should verify. 


Now we come to a crucial definition. 


2.29 Definition Borel set 


The smallest o-algebra on R containing all open subsets of R is called the 


collection of Borel subsets of R. An element of this o-algebra is called a Borel 
set. 


We have defined the collection of Borel subsets of R to be the smallest o-algebra 
on R containing all the open subsets of R. We could have defined the collection of 
Borel subsets of R to be the smallest c-algebra on R containing all the open intervals 
(because every open subset of R is the union of a sequence of open intervals). 
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2.30 Example Borel sets 


e Every closed subset of R is a Borel set because every closed subset of R is the 
complement of an open subset of R. 


e Every countable subset of R is a Borel set because if B = {x 1,x,...}, then 
B = Up1{xx}, which is a Borel set because each {x;,} is a closed set. 


e Every half-open interval [a,b) (where a,b € R) is a Borel set because [a,b) = 
Ma (4 — 5b). 


e If f: R — Risa function, then the set of points at which f is continuous is the 
intersection of a sequence of open sets (see Exercise 12 in this section) and thus 
is a Borel set. 


The intersection of every sequence of open subsets of R is a Borel set. However, 
the set of all such intersections is not the set of Borel sets (because it is not closed 
under countable unions). The set of all countable unions of countable intersections 
of open subsets of R is also not the set of Borel sets (because it is not closed under 
countable intersections). And so on ad infinitum—there is no finite procedure involv- 
ing countable unions, countable intersections, and complements for constructing the 
collection of Borel sets. 

We will see later that there exist subsets of R that are not Borel sets. However, any 
subset of R that you can write down in a concrete fashion is a Borel set. 


Inverse Images 


The next definition is used frequently in the rest of this chapter. 


2.31 Definition inverse image; f—'(A) 


If f: X + Y isa function and A C Y, then the set f—!(A) is defined by 


f UA) a1 ee X: f(a) e Al. 


2.32 Example inverse images 


Suppose f: [0,47t] — R is defined by f(x) = sin x. Then 
f—1((0,1]) = [0, ] U [27,371] U {471}, 
eh 

(2,3) 


as you should verify. 
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Inverse images have good algebraic properties, as is shown in the next two results. 


2.33 algebra of inverse images 

Suppose f: X — Y is a function. Then 

@ fi \AlV=X\ fF (A) forevery A CY; 

(b) f-'(Uae4 A) = Uaeu f 1 (A) for every set A of subsets of Y; 


(c) f- (Maes A) =Naca f(A) for every set A of subsets of Y. 


Proof Suppose A Cc Y. For x € X we have 
xefl(y\A) — f(x)eY\A 
<= F(x) EA 
<= x¢ f-i(A) 
=> x €X\ fA). 
Thus f~!(Y \ A) = X \ f-!(A), which proves (a). 
To prove (b), suppose A is a set of subsets of Y. Then 


xef (UU A) <= f(x)e UA 


AcA AcA 
<=> f(x) € Aforsome A€ A 
<> xc f 1(A) forsome A€ A 
=> xe U f'(A). 
AcA 


Thus f~!(Uae4 A) = Uses f 1 (A), which proves (b). 
Part (c) is proved in the same fashion as (b), with unions replaced by intersections 
and for some replaced by for every. 


2.34 inverse image of a composition 


Suppose f: X — Y and g: Y — W are functions. Then 


(go f) (A) =f-"(g*(A)) 


for every A C W. 


Proof Suppose A Cc W. For x € X we have 
x€(gof) (A) => (gof)(x) eA => g(f(x)) EA 
<> f(x) eg "(A) 
<> xref i(g1(A)). 
Thus (go f)~1(A) = f-*(g-"(A)). 
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Measurable Functions 


The next definition tells us which real-valued functions behave reasonably with 
respect to a g-algebra on their domain. 


Definition 


Suppose (X,S) is a measurable space. A function f: X — R is called 
S-measurable (or just measurable if S is clear from the context) if 


P Ges 


for every Borel set B C R. 


2.36 Example measurable functions 
e If S = {@,X}, then the only S-measurable functions from X to R are the 
constant functions. 


e If S is the set of all subsets of X, then every function from X to R is S- 
measurable. 


e If S = {@, (—co,0), (0,00), R} (which is a g-algebra on R), then a function 
f: R > Ris S-measurable if and only if f is constant on (—oo,0) and f is 
constant on [0, 00). 


Another class of examples comes from characteristic functions, which are defined 
below. The Greek letter chi (x) is traditionally used to denote a characteristic function. 


Suppose E is a subset of a set X. The characteristic function of E is the function 


Oye 1 ifxee£, 
oe mee ae 


X,_: X — R defined by 


The set X that contains E is not explicitly included in the notation x,, because X will 
always be clear from the context. 


2.38 Example inverse image with respect to a characteristic function 


Suppose (X,S) is a measurable space, E C X, and B C R. Then 


E if0 ¢ Band1 € B, 
X\E if0 € Band1 ¢B, 
x ifO € Band1 € B, 
@ if0 ¢ Band1 ¢ B. 


xX, '(B) = 


Thus we see that x, is an S-measurable function if and only if E € S. 
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The definition of an S-measurable 
function requires the inverse image 
of every Borel subset of R to be in 
S. The next result shows that to ver- -1 ian . 
ify that a function is S-measurable, f (2, °)} 1 a se) = a} 
we can check the inverse images of 
a much smaller collection of subsets 
of R. 


Note that if f : X —> R is a function and 
acéR, then 


2.39 condition for measurable function 


Suppose (X,S) is a measurable space and f: X — R is a function such that 


f*((a,00)) € 8 


for alla € R. Then f is an S-measurable function. 


Proof Let 
T ={ACR: f -}(A) €S}. 


We want to show that every Borel subset of R is in 7. To do this, we will first show 
that 7 is a c-algebra on R. 

Certainly © € T, because f-'(©) =@eE S. 

If A € T, then f—!(A) € S; hence 


fU(R\A)=X\ f(A) es 


by 2.33(a), and thus R \ A € 7. In other words, 7 is closed under complementation. 
If Ay, Az,... € T, then f—!(A1), f-!(Az),... € S; hence 


f*( 
k 


by 2.33(b), and thus Ur2., Ag € 7. In other words, 7 is closed under countable 
unions. Thus 7 is a c-algebra on R. 

By hypothesis, J contains {(a,00) : a € R}. Because 7 is closed under 
complementation, 7 also contains {(—oo,b] : b € R}. Because the g-algebra T is 
closed under finite intersections (by 2.25), we see that T contains {(a,b] : a,b € R}. 
Because (a,b) = Ug, (a,b — {| and (—00,b) = Uy (—k,b — ;] and T is closed 
under countable unions, we can conclude that 7 contains every open subset of R. 

Thus the c-algebra 7 contains the smallest 7-algebra on R that contains all open 
subsets of R. In other words, J contains every Borel subset of R. Thus f is an 
S-measurable function. 


foe) 


Ar) = U fan € 
1 k=1 


In the result above, we could replace the collection of sets {(a,00) : a € R} 
by any collection of subsets of R such that the smallest o-algebra containing that 
collection contains the Borel subsets of R. For specific examples of such collections 
of subsets of R, see Exercises 3-6. 
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We have been dealing with S-measurable functions from X to R in the context 
of an arbitrary set X and a o-algebra S on X. An important special case of this 
setup is when X is a Borel subset of R and S is the set of Borel subsets of R that are 
contained in X (see Exercise 11 for another way of thinking about this a-algebra). In 
this special case, the S-measurable functions are called Borel measurable. 


2.40 Definition Borel measurable function 


Suppose X C R. A function f: X — R is called Borel measurable if f—!(B) is 
a Borel set for every Borel set B C R. 


If X C R and there exists a Borel measurable function f: X — R, then X must 
be a Borel set [because X = f—!(R)]. 

If X C Rand f: X > Risa function, then f is a Borel measurable function if 
and only if f~!((a,00)) is a Borel set for every a € R (use 2.39). 

Suppose X is a set and f: X — R is a function. The measurability of f depends 
upon the choice of a v-algebra on X. If the c-algebra is called S, then we can discuss 
whether f is an S-measurable function. If X is a Borel subset of R, then S might 
be the set of Borel sets contained in X, in which case the phrase Borel measurable 
means the same as S-measurable. However, whether or not S is a collection of Borel 
sets, we consider inverse images of Borel subsets of R when determining whether a 
function is S-measurable. 

The next result states that continuity interacts well with the notion of Borel 
measurability. 


2.41 every continuous function is Borel measurable 


Every continuous real-valued function defined on a Borel subset of R is a Borel 
measurable function. 


Proof Suppose X C R is a Borel set and f: X — R is continuous. To prove that f 
is Borel measurable, fix a € R. 

If x € X and f(x) > a, then (by the continuity of f) there exists dy > 0 such that 
f(y) > a forall y € (x — dy, x + dx) NX. Thus 


f-*((a,00)) = ( U (x —6:,x +6x)) NX. 
x€f~1 ((a,00)) 


The union inside the large parentheses above is an open subset of R; hence its 
intersection with X is a Borel set. Thus we can conclude that f—! (a, 00)) is a Borel 
set. 

Now 2.39 implies that f is a Borel measurable function. 


Next we come to another class of Borel measurable functions. A similar definition 
could be made for decreasing functions, with a corresponding similar result. 
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Definition increasing function 


Suppose X C Rand f: X — Risa function. 


e f is called increasing if f(x) < f(y) for all x,y € X with x < y. 


e f is called strictly increasing if f(x) < f(y) for all x,y € X with x < y. 


Proof Suppose X C R is a Borel set and f: X — R is increasing. To prove that f 
is Borel measurable, fix a € R. 
Let b = inf f~'((a,00)). Then it is easy to see that 


f-*((a,00)) = (b,00) NX or f-'((a,00)) = [b,0) NX. 


Either way, we can conclude that f~!((a,00)) is a Borel set. 
Now 2.39 implies that f is a Borel measurable function. 


The next result shows that measurability interacts well with composition. 


2.44 composition of measurable functions 


Suppose (X,S) is a measurable space and f: X — R is an S-measurable 


function. Suppose g is a real-valued Borel measurable function defined on a 
subset of R that includes the range of f. Then go f: X — R is an S-measurable 
function. 


Proof Suppose B C Risa Borel set. Then (see 2.34) 
(go f)*(B) = f-*(g"*(B)). 


Because ¢ is a Borel measurable function, g~!(B) is a Borel subset of R. Because f 
is an S-measurable function, f-!(g~!(B)) € S. Thus the equation above implies 


that (go f)~1(B) € S. Thus go f is an S-measurable function. 


2.45 Example if f is measurable, then so are —f, af, Fe 


Suppose (X,S) is ameasurable space and f: X — Ris S-measurable. Then 2.44 
implies that the functions —f, 5 f,\f\,f* are all S-measurable functions because 
each of these functions can be written as the composition of f with a continuous (and 
thus Borel measurable) function g. 

Specifically, take g(x) = —x, then g(x) = 4x, then g(x) = |x 


g(x) =x, 


, and then 
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Measurability also interacts well with algebraic operations, as shown in the next 
result. 


2.46 algebraic operations with measurable functions 


Suppose (X,S) is a measurable space and f, ¢: X — R are S-measurable. Then 


(a) f+, f —g, and fg are S-measurable functions; 


(b) if g(x) #0 for all x € X, then f is an S-measurable function. 


Proof Suppose a € R. We will show that 


247 (f +g)1((ae)) = U (f-7((r,09)) Ng *((a—1,e))), 


reQ 


which implies that (f + g)~1((a,c0)) € S. 
To prove 2.47, first suppose 


x € (f +g)"'((a,0)). 


Thus a < f(x) +(x). Hence the open interval (a — g(x), f(x)) is nonempty, and 
thus it contains some rational number r. This implies that r < f(x), which means 
that x € f—1((r,00)), and a — g(x) <r, which implies that x € g~!((a—1,00)). 
Thus x is an element of the right side of 2.47, completing the proof that the left side 
of 2.47 is contained in the right side. 

The proof of the inclusion in the other direction is easier. Specifically, suppose 
x € f-1((r,00)) Ng-+((a —1,00)) for some r € Q. Thus 


r<f(x) and a—r< g(x). 


Adding these two inequalities, we see that a < f(x) + g(x). Thus x is an element of 
the left side of 2.47, completing the proof of 2.47. Hence f + g is an S-measurable 
function. 

Example 2.45 tells us that —g is an S-measurable function. Thus fr- g, which 
equals f + (—g) is an S-measurable function. 

The easiest way to prove that fg is an S-measurable function uses the equation 


(ig ares 
; 


fs= 


The operation of squaring an S-measurable function produces an S-measurable 
function (see Example 2.45), as does the operation of multiplication by 5 (again, see 
Example 2.45). Thus the equation above implies that fg is an S-measurable function, 
completing the proof of (a). 

Suppose g(x) # 0 for all x € X. The function defined on R \ {0} (a Borel subset 
of R) that takes x to 1 is continuous and thus is a Borel measurable function (by 
2.41). Now 2.44 implies that 1 is an S-measurable function. Combining this result 
with what we have already proved about the product of S-measurable functions, we 
conclude that f is an S-measurable function, proving (b). 
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The next result shows that the pointwise limit of a sequence of S-measurable 
functions is S-measurable. This is a highly desirable property (recall that the set of 
Riemann integrable functions on some interval is not closed under taking pointwise 
limits; see Example 1.17). 


2.48 limit of S-measurable functions 


Suppose (X,S) is a measurable space and fj, f2,... is a sequence of 
S-measurable functions from X to R. Suppose limz 5.5 f(x) exists for each 
x € X. Define f: X — R by 


f(x) = lim fe(2). 


Then f is an S-measurable function. 


Proof Suppose a € R. We will show that 


hk -1 1 
2.49 f(a.) =U 1 fe (a+ 4,0), 
j=lm=1k=m 
which implies that f~!((a,00)) € S. 

To prove 2.49, first suppose x € f—! ((a, 00)). Thus there exists 7 € Z* such that 
ix) > at i: The definition of limit now implies that there exists m € Z* such 
that f,(x) > a+ ij for all k > m. Thus x is in the right side of 2.49, proving that the 
left side of 2.49 is contained in the right side. 

To prove the inclusion in the other direction, suppose x is in the right side of 2.49. 
Thus there exist j,m € Z* such that f,(x) > a+ i for all k > m. Taking the 
limit as k + 00, we see that f(x) > a+ 4 >a. Thus x is in the left side of 2.49, 
completing the proof of 2.49. Thus f is an S-measurable function. 


Occasionally we need to consider functions that take values in [—o0o,00]. For 
example, even if we start with a sequence of real-valued functions in 2.53, we might 
end up with functions with values in [—co, co]. Thus we extend the notion of Borel 
sets to subsets of [—co, o9], as follows. 


' 2.50 Definition Borel subsets of [—0o, co} 


A subset of [—oo, co] is called a Borel set if its intersection with R is a Borel set. 


In other words, a set C C [—00,0v] is a Borel set if and only if there exists 
a Borel set B C R such that C = B orC = BU{o} or C = BU {—o0} or 
C = BU {~, —ov}. 

You should verify that with the definition above, the set of Borel subsets of 
[—09, co] is a v-algebra on [—o0, oo]. 

Next, we extend the definition of S-measurable functions to functions taking 
values in [—00, 00]. 
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2.51 Definition measurable function 


Suppose (X,S) is a measurable space. A function f: X — [—09, 00] is called 


S-measurable if 


Po Ges 


for every Borel set B C [—co, oo]. 


The next result, which is analogous to 2.39, states that we need not consider all 
Borel subsets of [—00, co] when taking inverse images to determine whether or not a 
function with values in [—oo, co] is S-measurable. 


2.52 condition for measurable function 
Suppose (X,S) is a measurable space and f: X —> [—co, oo] is a function such 


that 
fr (a@caes 


for all a € R. Then f is an S-measurable function. 


The proof of the result above is left to the reader (also see Exercise 27 in this 
section). 

We end this section by showing that the pointwise infimum and pointwise supre- 
mum of a sequence of S-measurable functions is S-measurable. 


2.53 infimum and supremum of a sequence of S-measurable functions 


Suppose (X,S) is a measurable space and fj, f2,... is a sequence of 
S-measurable functions from X to [—00, 00]. Define ¢,h: X — [—c,00] by 


g(x) =inf{f,(x):kE Zt} and h(x) = sup{f(x):k € Z*F}. 


Then g and h are S-measurable functions. 


Proof Leta € R. The definition of the supremum implies that 


h-((a,00]) = UJ fe *((a,00}), 
k=1 


as you should verify. The equation above, along with 2.52, implies that 1 is an 
S-measurable function. 
Note that 


g(x) = —sup{—fi(x) sk € Z*} 
for all x € X. Thus the result about the supremum implies that g is an S-measurable 
function. 
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EXERCISES 2B 


10 


11 


12 


13 


Show that S = {Unex(n,n +1]: K C Z} is a c-algebra on R. 
Verify both bullet points in Example 2.28. 


Suppose S is the smallest g-algebra on R containing {(r,s] : r,s € Q}. Prove 
that S is the collection of Borel subsets of R. 


Suppose S is the smallest 7-algebra on R containing {(r,n]:r € Q,n € Z}. 
Prove that S is the collection of Borel subsets of R. 


Suppose S is the smallest g-algebra on R containing {(r,r +1): r € Q}. 
Prove that S is the collection of Borel subsets of R. 


Suppose S is the smallest v-algebra on R containing {[r,oo) : r € Q}. Prove 
that S is the collection of Borel subsets of R. 


Prove that the collection of Borel subsets of R is translation invariant. More 
precisely, prove that if B C R is a Borel set and t € R, then t + B is a Borel set. 


Prove that the collection of Borel subsets of R is dilation invariant. More 
precisely, prove that if B C R is a Borel set and t € R, then fB (which is 
defined to be {tb : b € B}) is a Borel set. 


Give an example of a measurable space (X,S) and a function f: X — R such 
that | f| is S-measurable but f is not S-measurable. 


Show that the set of real numbers that have a decimal expansion with the digit 5 
appearing infinitely often is a Borel set. 


Suppose 7 is a g-algebra ona set Y and X € JT. LetS={EE€T:ECX}. 


(a) Show thatS = {FN X:F eT}. 
(b) Show that S is a c-algebra on X. 


Suppose f: R — R is a function. 


(a) Fork € Z*, let 


G, = {a € R: there exists 6 > 0 such that | f(b) — f(c)| < b 
for all b,c € (a—6,a+6)}. 


Prove that G; is an open subset of R for each k € Z*. 
(b) Prove that the set of points at which f is continuous equals ()7°_, Gx. 


(c) Conclude that the set of points at which f is continuous is a Borel set. 


Suppose (X,S) is a measurable space, E1,...,E, are disjoint subsets of X, and 
C1,...,Cy are distinct nonzero real numbers. Prove that C1Xp, Trt enX, is 
an S-measurable function if and only if Ey,...,E, € S. 
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(a) Suppose fi, f2,... is a sequence of functions from a set X to R. Explain 
why 


{x € X: the sequence f; (x), fo(x),... has a limit in R} 


co wo C 


=I Gah) (egal): 


n=1 j=1k=j 


(b) Suppose (X,S) is a measurable space and f;, fo,... is a sequence of S- 
measurable functions from X to R. Prove that 


{x € X : the sequence f(x), fo(x),... has a limit in R} 


is an S-measurable subset of X. 


Suppose X is a set and EF), E,... is a disjoint sequence of subsets of X such 
that 7, Ey = X. Let S = {Upee Ey KC Zh. 
(a) Show that S is a c-algebra on X. 


(b) Prove that a function from X to R is S-measurable if and only if the function 
is constant on E; for every k € Z*. 


Suppose S is a g-algebra on a set X and A C X. Let 
Sa={EE€S:ACEor ANE=®}. 


(a) Prove that S, is a c-algebra on X. 


(b) Suppose f: X — R is a function. Prove that f is measurable with respect 
to S, if and only if f is measurable with respect to S and f is constant 
on A. 


Suppose X is a Borel subset of R and f: X — R is a function such that 
{x © X : f is not continuous at x} is a countable set. Prove f is a Borel 
measurable function. 


Suppose f: R — R is differentiable at every element of R. Prove that f’ is a 
Borel measurable function from R to R. 


Suppose X is a nonempty set and S is the g-algebra on X consisting of all 
subsets of X that are either countable or have a countable complement in X. 
Give a characterization of the S-measurable real-valued functions on X. 


Suppose (X,S) is a measurable space and f,g: X — R are S-measurable 
functions. Prove that if f(x) > 0 for all x € X, then f8 (which is the function 
whose value at x € X equals f (x)8(*)) is an S-measurable function. 


Prove 2.52. 


Suppose B C R and f: B — R is an increasing function. Prove that f is 
continuous at every element of B except for a countable subset of B. 
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Suppose f: R —> R is a strictly increasing function. Prove that the inverse 
function f—!: f(R) + R is a continuous function. 
[Note that this exercise does not have as a hypothesis that f is continuous. | 


Suppose B C Ris a Borel set and f: B — R is an increasing function. Prove 
that f(B) is a Borel set. 


Suppose B C Rand f: B — Ris an increasing function. Prove that there exists 
a sequence f;, f2,... of strictly increasing functions from B to R such that 


f(x) = lim f,(x) 


k- 00 
for every x € B. 


Suppose B C R and f: B — R is a bounded increasing function. Prove that 
there exists an increasing function g: R — R such that g(x) = f(x) for all 
xe B. 


Prove or give a counterexample: If (X,S) is a measurable space and 
f: X > [—09, 009] 


is a function such that f-!((a,co)) € S for every a € R, then f is an 
S-measurable function. 


Suppose f: B — R is a Borel measurable function. Define g: R — R by 


= | FR) tees, 
a= fh ifx €R\B. 


Prove that g is a Borel measurable function. 


Give an example of a measurable space (X,S) and a family {f+}+er such 
that each f; is an S-measurable function from X to [0,1], but the function 
ff: X — [0,1] defined by 


f(x) = sup{fi(x) st € R} 


is not S-measurable. 
[Compare this exercise to 2.53, where the index set is Z* rather than R.] 


Show that 


lim (Jim (cos(j!7x))*) 7 


jo \k- 00 


1 if x is rational, 
0 if x is irrational 


for every x ER. 
[This example is due to Henri Lebesgue. | 
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2C Measures and Their Properties 


Definition and Examples of Measures 


The original motivation for the next definition came from trying to extend the notion 
of the length of an interval. However, the definition below allows us to discuss size 
in many more contexts. For example, we will see later that the area of a set in the 
plane or the volume of a set in higher dimensions fits into this structure. The word 
measure allows us to use a single word instead of repeating theorems for length, area, 
and volume. 


2.54 Definition measure 


Suppose X is a set and S is a c-algebra on X. A measure on (X,S) is a function 
wu: S — [0,00] such that p(@) = 0 and 


MU Ex) = Z (Ex) 


for every disjoint sequence Ey, Eo,... of sets in S. 


The countable additivity that forms the key 
part of the definition above allows us to prove 
good limit theorems. Note that countable ad- 
ditivity implies finite additivity: if is a mea- 
sure on (X,S) and F1,...,E, are disjoint 
sets in S, then 


In the mathematical literature, 
sometimes a measure on (X,S) 
is just called a measure on X if 
the o-algebra S is clear from 
the context. 


The concept of a measure, as 


p(E, U---UEn) = w(E1) +--+ + u(En), 


as follows from applying the equation 
u(®) = 0 and countable additivity to the dis- 
joint sequence Fj,...,En,©,@,... of sets 


defined here, is sometimes called 
a positive measure (although the 
Phrase nonnegative measure 
would be more accurate). 


in S. 
2.55 Example measures 


e If X is aset, then counting measure is the measure j/ defined on the o-algebra 
of all subsets of X by setting (E) = n if E is a finite set containing exactly n 
elements and p(E) = o if E is not a finite set. 


Suppose X is a set, S is a v-algebra on X, andc € X. Define the Dirac measure 
d¢ on (X,S) by 

ifc € E, 

0 ifc¢E. 

This measure is named in honor of mathematician and physicist Paul Dirac (1902- 


1984), who won the Nobel Prize for Physics in 1933 for his work combining 
relativity and quantum mechanics at the atomic level. 
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Suppose X is a set, S is a g-algebra on X, and w: X —> [0,00] is a function. 
Define a measure p/ on (X,S) by 


H(E) = )_ w(x) 


xeE 


for E € S. [Here the sum is defined as the supremum of all finite subsums 
YVxvep w(x) as D ranges over all finite subsets of E.] 


Suppose X is a set and S is the g-algebra on X consisting of all subsets of X 
that are either countable or have a countable complement in X. Define a measure 
pon (X,S) by 


w(E) = 


0 if E is countable, 
3 if E is uncountable. 


Suppose S is the c-algebra on R consisting of all subsets of R. Then the function 
that takes a set E C R to |E| (the outer measure of E) is not a measure because 
itis not finitely additive (see 2.18). 


Suppose B is the g-algebra on R consisting of all Borel subsets of R. We will 
see in the next section that outer measure is a measure on (R, 8). 


The following terminology is frequently useful. 


2.56 Definition measure space 


A measure space is an ordered triple (X,S, j1), where X is a set, S is a g-algebra 
on X, and ji is a measure on (X,S). 


Properties of Measures 


The hypothesis that 4(D) < co is needed in part (b) of the next result to avoid 
undefined expressions of the form oo — oo, 


2.57 measure preserves order; measure of a set difference 


Suppose (X,S, }1) is a measure space and D,E € S are such that D C E. Then 


(a) w(D) < H(E): 
(b) w(E \ D) = u(E) — u(D) provided that u(D) < 0. 


Proof Because E = DU (E \ D) and this is a disjoint union, we have 


w(E) = w(D) + H(E\ D) = p(D), 


which proves (a). If #(D) < ©, then subtracting (D) from both sides of the 
equation above proves (b). 
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The countable additivity property of measures applies to disjoint countable unions. 
The following countable subadditivity property applies to countable unions that may 
not be disjoint unions. 


2.58 countable subadditivity 


Suppose (X,S, j/) is a measure space and E,, E,... € S. Then 


"(U Ex) = Yi mlEs) 


Proof Let Dy = @and Dy = E, U---U Ex_, fork > 2. Then 
E, \ Dy, Eo \ Do, E3 \ D3,... 
is a disjoint sequence of subsets of X whose union equals Ur, Ex. Thus 


n(U Ex) - »(U (Ex \ Dy)) 


k= 


where the second line above follows from the countable additivity of j and the last 
line above follows from 2.57(a). 


Note that countable subadditivity implies finite subadditivity: if jz is a measure on 
(X,S) and Ey,...,E, are sets in S, then 


w(E, U---UEn) < (Ei) +--+ +p(En), 


as follows from applying the equation #(©) = 0 and countable subadditivity to the 
sequence Ey,...,En,©,@,... of sets in S. 
The next result shows that measures behave well with increasing unions. 


2.59 measure of an increasing union 


Suppose (X,S,}1) is a measure space and E; C E> C --- is an increasing 
sequence of sets in S. Then 


co 


yu (U Ex) > a w(Ex). 


Proof If w(E,) = co for some k € Z*, then the equation above holds because 
both sides equal co. Hence we can consider only the case where (Ex) < oo for all 
ae ae 
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For convenience, let Eg = ©. Then 
CO foe) 
U Ek = UE \ Fj-1), 
k=1 j=l 


where the union on the right side is a disjoint union. Thus 


“(U Ex) = ¥ w(Ej\ Ea) 
at j=l 


k 
= ii EVE. 
fim XH Peet) 


k 
= lim )) (#(E;) — #(Ej-1)) 


ae | 


— jim H(Ex), 
—0o 
Another mew. 


as desired. 


Measures also behave well with respect to decreasing intersections (but see Exer- 
cise 10, which shows that the hypothesis j4(E) < co below cannot be deleted). 


2.60 measure of a decreasing intersection 


Suppose (X,S,}) is a measure space and E; D Ez D --- is a decreasing 
sequence of sets in S, with (E,) < co. Then 


“(A Ex) = Jim p(Ex). 


Proof One of De Morgan’s Laws tells us that 


foe} 


ae J (Er \ Ex). 


k=1 k=1 


Now E; \ EF; C Ey \ Ex C E, \ E3 C --- is an increasing sequence of sets in S. 
Thus 2.59, applied to the equation above, implies that 


#(Ex\ () Ex) = him w(Es \ Ex): 
k=1 


Use 2.57(b) to rewrite the equation above as 
foe) 
w(E1) — #(() Ex) = #(En) - jim 1(Ex), 
k=1 —0o 


which implies our desired result. 
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The next result is intuitively plausible—we expect that the measure of the union of 
two sets equals the measure of the first set plus the measure of the second set minus 
the measure of the set that has been counted twice. 


2.61 measure of a union 


Suppose (X,S, 1) is a measure space and D, E € S, with w(DME) < co. Then 


pw(DUE) = p(D) + w(E) — p(DNE). 


Proof We have 
DUE=(D\ (DME) SUE \ (NE) U (Pie). 
The right side of the equation above is a disjoint union. Thus 


w(DUE) = p(D\ (DN E)) +u(E\ (DNE)) + u(DNE) 
= (u(D) — u(DNE)) + (u(E) — #(DNE)) +e(DNE) 
= w(D) + u(E) —#(DNE), 


as desired. 


EXERCISES 2C 


1 Explain why there does not exist a measure space (X,S, jt) with the property 
that {u(E): E € S} = (0,1). 
Let 22" denote the o-algebra on Z* consisting of all subsets of Z*. 


2 Suppose p is a measure on (Zt, Zz" ). Prove that there is a sequence w1, W,... 


in [0, co] such that 
u(E) = )- we 
keE 


for every set EC Zt. 
3 Give an example of a measure p on (Z+,22") such that 
{wee Cr = 16,71. 


4 Give an example of a measure space (X,S, 4) such that 


{WE\tE ES} = {oo} U UJ [3k, 3k + 1]. 
k=0 


10 


11 


12 
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Suppose (X,S, 1) is a measure space such that p(X) < oo. Prove that if A is 
a set of disjoint sets in S such that (A) > 0 for every A € A, then A isa 
countable set. 


Find all c € [3, 00) such that there exists a measure space (X,S, 4) with 
{w(E):E€ S} = [0,1] U[3,c]. 

Give an example of a measure space (X,S, j1) such that 
{H(E) : E € S} = [0,1] U [3, 0]. 


Give an example of a set X, a o-algebra S of subsets of X, a set A of subsets of X 
such that the smallest 7-algebra on X containing A is S, and two measures p and 
v on (X,S) such that p(A) = v(A) forall A € A and p(X) = v(X) < ~, 
but p Av. 


Suppose ji and v are measures on a measurable space (X,S). Prove that p + v 
is a measure on (X,S). [Here + v is the usual sum of two functions: if E € S, 
then (w+ v)(E) = w(E) + v(E).] 


Give an example of a measure space (X,S,,) and a decreasing sequence 
E, > Eo D--- of sets in S such that 


aca Ex) # jim (Ex). 
Suppose (X,S, j/) is a measure space and C,D,E € S are such that 
u(CND) < 0,u(CN E) < 9, and u(DNE) < ». 


Find and prove a formula for #(C U D U E) in terms of u(C), p(D), u(E), 
u(CND), u(CNE), p(DNE), and p(CN DME). 


Suppose X is a set and S is the o-algebra of all subsets E of X such that E is 
countable or X \ E is countable. Give a complete description of the set of all 
measures on (X,S). 
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2D Lebesgue Measure 


Additivity of Outer Measure on Borel Sets 


Recall that there exist disjoint sets A,B € R such that |A U B| 4 |A| + |B| (see 
2.18). Thus outer measure, despite its name, is not a measure on the o-algebra of all 
subsets of R. 

Our main goal in this section is to prove that outer measure, when restricted to the 
Borel subsets of R, is a measure. Throughout this section, be careful about trying to 
simplify proofs by applying properties of measures to outer measure, even if those 
properties seem intuitively plausible. For example, there are subsets A C B CR 
with |A| < oo but |B \ A] # |B| — |A| [compare to 2.57(b)]. 

The next result is our first step toward the goal of proving that outer measure 
restricted to the Borel sets is a measure. 


2.62 additivity of outer measure if one of the sets is open 


Suppose A and G are disjoint subsets of R and G is open. Then 


|AUG| = |A|+|GI. 


Proof We can assume that |G| < oo because otherwise both |A U G| and | A| + |G| 
equal oo. 

Subadditivity (see 2.8) implies that |A U G| < |A| + |G]. Thus we need to prove 
the inequality only in the other direction. 

First consider the case where G = (a,b) for some a,b € R witha < b. We 
can assume that a,b ¢ A (because changing a set by at most two points does not 
change its outer measure). Let Ij, In,... be a sequence of open intervals whose union 
contains A U G. For eachn € Z¢*, let 


In = InN (—09,4), Kn =InN (a,b), La = In (b, 0). 


Then 
(In) = &(In) + &(Kn) + (Ln). 


Now Jj, L1, Jz, L2,... 1s a sequence of open intervals whose union contains A and 
Ky, K2,... is a sequence of open intervals whose union contains G. Thus 


¥ #n) 2 ¥ (ed) + &(Ln)) + ¥ Ak) 
> |A| + |G]. 


The inequality above implies that |A U G| > |A| + |G 
|A U G| = |A| +]|G| in this special case. 
Using induction on m, we can now conclude that if m € Z* and G is a union of 
m disjoint open intervals that are all disjoint from A, then |A U G| = |A| + |G]. 
Now suppose G is an arbitrary open subset of R that is disjoint from A. Then 
G = Up, In for some sequence of disjoint open intervals I), In,..., each of which 
is disjoint from A. Now for each m € Z* we have 


, completing the proof that 
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m 
|AUG| > |AU(U hn)| 

n=1 

m 
= |A| +) Un) 

n=1 

Thus 

|JAUG| > |Al-+ Yo €Un) 

n=1 

2 |Al +|G| 


completing the proof that |A UG] = |A|+ |G]. 


The next result shows that the outer measure of the disjoint union of two sets is 
what we expect if at least one of the two sets is closed. 


2.63 additivity of outer measure if one of the sets is closed 


Suppose A and F are disjoint subsets of R and F is closed. Then 


|AUF| =|A|+|FI. 


Proof Suppose Jy, Ip,...is a sequence of open intervals whose union contains A U F. 
Let G = UR, Ik. Thus G is an open set with A U F C G. Hence A C G \ F, which 
implies that 


2.64 |A| < |G\ El. 


Because G \ F = GM (R \ F), we know that G \ F is an open set. Hence we can 
apply 2.62 to the disjoint union G = F U (G \ F), getting 


|G] = |F| + |G \ FI. 
Adding |F| to both sides of 2.64 and then using the equation above gives 
|A| + |F] < |G| 
< )' ek). 
k=1 
Thus |A| + |F| < |AU FI, which implies that |A| + |F] = |AU FI. 


Recall that the collection of Borel sets is the smallest c-algebra on R that con- 
tains all open subsets of R. The next result provides an extremely useful tool for 
approximating a Borel set by a closed set. 


2.65 approximation of Borel sets from below by closed sets 


Suppose B C R is a Borel set. Then for every ¢ > 0, there exists a closed set 
F C B such that |B \ F| < «. 
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Proof Let 


£L={D CR: for every ¢ > 0, there exists a closed set 
F Cc Dsuch that |D \ F| < e}. 


The strategy of the proof is to show that £ is a v-algebra. Then because £ contains 
every closed subset of R (if D C R is closed, take F = D in the definition of £), by 
taking complements we can conclude that £ contains every open subset of R and 
thus every Borel subset of R. 

To get started with proving that £ is a c-algebra, we want to prove that £ is closed 
under countable intersections. Thus suppose D1, D2,... is a sequence in £. Let 
€ > 0. For each k € Z*, there exists a closed set F, such that 


€ 
F, C Dy and |Dx \ Fr| < 5k 


Thus (72, Fy is a closed set and 


Arc (\De and (()D)\(N i) C Ue\ te. 


k=1 k=1 k=1 k=1 k=1 


The last set inclusion and the countable subadditivity of outer measure (see 2.8) imply 


that 
(7) Px) \ (1 Fe] <e 


k=1 k=1 


Thus (\2_, Dy € L, proving that £L is closed under countable intersections. 

Now we want to prove that £ is closed under complementation. Suppose D € £ 
and ¢ > 0. We want to show that there is a closed subset of R \ D whose set 
difference with R \ D has outer measure less than ¢, which will allow us to conclude 
thatR\ D € CL. 

First we consider the case where |D| < oo. Let F C D be aclosed set such that 
|D \ F| < 5. The definition of outer measure implies that there exists an open set G 
such that D C G and |G| < |D| + 5. Now R \ Gis aclosed set and R\ G C R\ D. 
Also, we have 


(R\D)\(R\G)=G\D 
GG Ve. 
Thus 
|(R\ D)\ (R\G)| < |G\F| 
= |G|—|F| 
= (|G| — |D]) + (IDI — |FI) 
<=+|D\F 


< €, 
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where the equality in the second line above comes from applying 2.63 to the disjoint 
union G = (G \ F) UF, and the fourth line above uses subadditivity applied to 
the union D = (D \ F) UF. The last inequality above shows that R \ D € CL, as 
desired. 

Now, still assuming that D € £ and € > 0, we consider the case where |D| = ov. 
For k € Z*, let Dk = DN [—k,k]. Because Dy € £ and |D;| < 09, the previous 
case implies that R \ Dy € £. Clearly D = U2, Dx. Thus 


R\D= 7\(R \ Dy). 


k=1 


Because L is closed under countable intersections, the equation above implies that 
R\ D€ EL, which completes the proof that £ is a v-algebra. 


Now we can prove that the outer measure of the disjoint union of two sets is what 
we expect if at least one of the two sets is a Borel set. 


2.66 additivity of outer measure if one of the sets is a Borel set 


Suppose A and B are disjoint subsets of R and B is a Borel set. Then 


|AUB| = |A|+|BI. 


Proof Lete > 0. Let F be aclosed set such that F C B and |B \ F| < ¢ (see 2.65). 
Thus 


|AUB| > |AUE| 


= |A| + |F| 
= |A| + |B] —|B\ FI 
= |A|+|Bl —«, 


where the second and third lines above follow from 2.63 [use B = (B \ F) UF for 
the third line]. 

Because the inequality above holds for all e > 0, we have |A U B| > |A| + |B 
which implies that |A U B| = |A| + |B]. 


> 


You have probably long suspected that not every subset of R is a Borel set. Now 
we can prove this suspicion. 


2.67 existence of a subset of R that is not a Borel set 


There exists a set B C R such that |B] < 0 and B is not a Borel set. 


Proof In the proof of 2.18, we showed that there exist disjoint sets A,B C R such 
that |A UB] # |A|+|B|. For any such sets, we must have |B| < oo because 
otherwise both |A U B] and |A| + |B| equal oo (as follows from the inequality 
|B| < |A UB|). Now 2.66 implies that B is not a Borel set. 
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The tools we have constructed now allow us to prove that outer measure, when 
restricted to the Borel sets, is a measure. 


2.68 outer measure is a measure on Borel sets 


Outer measure is a measure on (R, B Ye where BG is the g-algebra of Borel subsets 
of R. 


Proof Suppose By, Bo,...is a disjoint sequence of Borel subsets of R. Then for 


each n € Z* we have 00 n 
|U Be] = [UB 
k=1 k=1 


n 
= )o|Be 
k=l 


where the first line above follows from 2.5 and the last line follows from 2.66 (and 
induction on n). Taking the limit as 1 — co, we have [Ue B;| =) 4 Bel 
The inequality in the other directions follows from countable subadditivity of outer 
measure (2.8). Hence 


ry 


co [oe] 
| U B; | = DU |Bul: 
k=1 k=1 
Thus outer measure is a measure on the o-algebra of Borel subsets of R. 


The result above implies that the next definition makes sense. 


2.69 Detini 


Lebesgue measure is the measure on (R, B), where BG is the g-algebra of Borel 
subsets of R, that assigns to each Borel set its outer measure. 


In other words, the Lebesgue measure of a set is the same as its outer measure, 
except that the term Lebesgue measure should not be applied to arbitrary sets but 
only to Borel sets (and also to what are called Lebesgue measurable sets, as we will 
soon see). Unlike outer measure, Lebesgue measure is actually a measure, as shown 
in 2.68. Lebesgue measure is named in honor of its inventor, Henri Lebesgue. 


The cathedral in Beauvais, the 
French city where Henri 
Lebesgue (1875-1941) was 
born. Much of what we call 
Lebesgue measure and 
Lebesgue integration was 
developed by Lebesgue in his 
1902 PAD thesis. Emile Borel 
was Lebesgue’s PhD thesis 
advisor. CC-BY-SA James Mitchell 
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Lebesgue Measurable Sets 


We have accomplished the major goal of this section, which was to show that outer 
measure restricted to Borel sets is a measure. As we will see in this subsection, outer 
measure is actually a measure on a somewhat larger class of sets called the Lebesgue 
measurable sets. 

The mathematics literature contains many different definitions of a Lebesgue 
measurable set. These definitions are all equivalent—the definition of a Lebesgue 
measurable set in one approach becomes a theorem in another approach. The ap- 
proach chosen here has the advantage of emphasizing that a Lebesgue measurable set 
differs from a Borel set by a set with outer measure 0. The attitude here is that sets 
with outer measure 0 should be considered small sets that do not matter much. 


2.70 Definition Lebesgue measurable set 


A set A C Ris called Lebesgue measurable if there exists a Borel set B C A 
such that |A \ B| = 0. 


Every Borel set is Lebesgue measurable because if A C R is a Borel set, then we 
can take B = A in the definition above. 

The result below gives several equivalent conditions for being Lebesgue measur- 
able. The equivalence of (a) and (d) is just our definition and thus is not discussed in 
the proof. 

Although there exist Lebesgue measurable sets that are not Borel sets, you are 
unlikely to encounter one. The most important application of the result below is that 
if A C Risa Borel set, then A satisfies conditions (b), (c), (e), and (f). Condition (c) 
implies that every Borel set is almost a countable union of closed sets, and condition 
(f) implies that every Borel set is almost a countable intersection of open sets. 


2.71 equivalences for being a Lebesgue measurable set 
Suppose A C R. Then the following are equivalent: 


(a) A is Lebesgue measurable. 


(b) For each € > 0, there exists a closed set F C A with |A \ F| <«. 


(c) There exist closed sets F,, Fo,... contained in A such that |A \ I) F,| = (), 
k=1 


(d) There exists a Borel set B C A such that |A \ B| = 0. 


(e) For each € > 0, there exists an open set G D A such that |G \ A] < «. 


(f) There exist open sets Gj, Gz,... containing A such that | ( () Gr) \ Al =@. 
ki 


(g) There exists a Borel set B D> A such that |B \ A| = 0. 


Section 2D Lebesgue Measure 53 


Proof Let £ denote the collection of sets A C R that satisfy (b). We have already 
proved that every Borel set is in £ (see 2.65). As a key part of that proof, which we 
will freely use in this proof, we showed that £ is a v-algebra on R (see the proof 
of 2.65). In addition to containing the Borel sets, £ contains every set with outer 
measure 0 [because if |A| = 0, we can take F = @ in (b)]. 

(b) => (c): Suppose (b) holds. Thus for each n € Zt, there exists a closed set 
F, C A such that |A \ Ful < 4. Now 


A\URCA\F 
k=1 

for each n € Z+. Thus |A \ U2, Fl < |A\ Ful < 4 for each n € Z*. Hence 
|A \ Up, Fe| = 0, completing the proof that (b) implies (c). 

(c) = > (d): Because every countable union of closed sets is a Borel set, we see 
that (c) implies (d). 

(d) = > (b): Suppose (d) holds. Thus there exists a Borel set B C A such that 
|A \ B| = 0. Now 

A=BU(A\B). 

We know that B € £ (because B is a Borel set) and A \ B € L (because A \ B has 
outer measure 0). Because £ is a c-algebra, the displayed equation above implies 
that A € L. In other words, (b) holds, completing the proof that (d) implies (b). 

At this stage of the proof, we now know that (b) ==> (c) = > (d). 

(b) = > (e): Suppose (b) holds. Thus A € L. Let ¢ > 0. Then because 
R\ A € £ (which holds because £ is closed under complementation), there exists a 
closed set F C R \ A such that 


\(R\A)\ Fl <e 


Now R \ F is an open set with R \ F 5 A. Because (R \ F) \ A = (R\ A) \F, 
the inequality above implies that |(R \ F) \ A| < e. Thus (e) holds, completing the 
proof that (b) implies (e). 

(e) = > (f): Suppose (e) holds. Thus for each n € Z*, there exists an open set 
Gy D A such that |G, \ Al < i. Now 


(0 Gx) \AEG A 


for each n € Z*. Thus |(Mg4 Gx) \ Al < [Gn \ A] < + foreach 1 € Z*. Hence 
| (M21 G.) \ A] = 0, completing the proof that (e) implies (f). 

(f) = > (g): Because every countable intersection of open sets is a Borel set, we 
see that (f) implies (g). 

(g) = > (b): Suppose (g) holds. Thus there exists a Borel set B D A such that 
|B \ A| = 0. Now 

A=BN (R \ (B\ A)). 

We know that B € L£ (because B is a Borel set) and R \ (B \ A) € L (because this 
set is the complement of a set with outer measure 0). Because L is a o-algebra, the 
displayed equation above implies that A € L. In other words, (b) holds, completing 
the proof that (g) implies (b). 

Our chain of implications now shows that (b) through (g) are all equivalent. 
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In addition to the equivalences in the 
previous result, see Exercise 13 in this 
section for another condition that is equiv- 
alent to being Lebesgue measurable. Also 
see Exercise 6, which shows that a set 
with finite outer measure is Lebesgue mea- 
surable if and only if it is almost a finite disjoint union of bounded open intervals. 

Now we can show that outer measure is a measure on the Lebesgue measurable 
sets. 


2.72 outer measure is a measure on Lebesgue measurable sets 


(a) The set L of Lebesgue measurable subsets of R is a g-algebra on R. 


In practice, the most useful part of 
Exercise 6 is the result that every 


Borel set with finite measure is 
almost a finite disjoint union of 
bounded open intervals. 


(b) Outer measure is a measure on (R, £). 


Proof Because (a) and (b) are equivalent in 2.71, the set £ of Lebesgue measurable 
subsets of R is the collection of sets satisfying (b) in 2.71. As noted in the first 
paragraph of the proof of 2.71, this set is a v-algebra on R, proving (a). 

To prove the second bullet point, suppose Ay, A2,... is a disjoint sequence of 
Lebesgue measurable sets. By the definition of Lebesgue measurable set (2.70), for 
each k € Z* there exists a Borel set Bk C Ax such that |A, \ By| = 0. Now 


|U 4e] = |U Be 
kal kel 
=) |B, | 
fal 


=) \Adl, 
= 


where the second line above holds because B,, Bz,... is a disjoint sequence of Borel 
sets and outer measure is a measure on the Borel sets (see 2.68); the last line above 
holds because B, C A, and by subadditivity of outer measure (see 2.8) we have 
[Ax] = |By U (Ag \ Be)| S [Bel + |Ax \ Bel = [Bel 

The inequality above, combined with countable subadditivity of outer measure 
(see 2.8), implies that |e Ax| = V,|Ag|, completing the proof of (b). 


If A is a set with outer measure 0, then A is Lebesgue measurable (because we 
can take B = @ in the definition 2.70). Our definition of the Lebesgue measurable 
sets thus implies that the set of Lebesgue measurable sets is the smallest c-algebra 
on R containing the Borel sets and the sets with outer measure 0. Thus the set of 
Lebesgue measurable sets is also the smallest o-algebra on R containing the open 
sets and the sets with outer measure 0. 

Because outer measure is not even finitely additive (see 2.18), 2.72(b) implies that 
there exist subsets of R that are not Lebesgue measurable. 
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We previously defined Lebesgue measure as outer measure restricted to the Borel 
sets (see 2.69). The term Lebesgue measure is sometimes used in mathematical 
literature with the meaning as we previously defined it and is sometimes used with 
the following meaning. 


2.73 Definition Lebesgue measure 


Lebesgue measure is the measure on (R, £), where CL is the g-algebra of Lebesgue 
measurable subsets of R, that assigns to each Lebesgue measurable set its outer 
measure. 


The two definitions of Lebesgue measure disagree only on the domain of the 
measure—is the o-algebra the Borel sets or the Lebesgue measurable sets? You 
may be able to tell which is intended from the context. In this book, the domain is 
specified unless it is irrelevant. 

If you are reading a mathematics paper and the domain for Lebesgue measure 
is not specified, then it probably does not matter whether you use the Borel sets 
or the Lebesgue measurable sets (because every Lebesgue measurable set differs 
from a Borel set by a set with outer measure 0, and when dealing with measures, 
what happens on a set with measure 0 usually does not matter). Because all sets that 
arise from the usual operations of analysis are Borel sets, you may want to assume 
that Lebesgue measure means outer measure on the Borel sets, unless what you are 
reading explicitly states otherwise. 

A mathematics paper may also refer to 
a measurable subset of R, without further 
explanation. Unless some other o-algebra 
is clear from the context, the author prob- 
ably means the Borel sets or the Lebesgue 
measurable sets. Again, the choice prob- 
ably does not matter, but using the Borel 
sets can be cleaner and simpler. 

Lebesgue measure on the Lebesgue measurable sets does have one small advantage 
over Lebesgue measure on the Borel sets: every subset of a set with (outer) measure 
0 is Lebesgue measurable but is not necessarily a Borel set. However, any natural 
process that produces a subset of R will produce a Borel set. Thus this small 
advantage does not often come up in practice. 


The emphasis in some textbooks on 
Lebesgue measurable sets instead of 
Borel sets probably stems from the 


historical development of the subject, 
rather than from any common use of 
Lebesgue measurable sets that are 
not Borel sets. 


Cantor Set and Cantor Function 


Every countable set has outer measure 0 (see 2.4). A reasonable question arises 
about whether the converse holds. In other words, is every set with outer measure 
0 countable? The Cantor set, which is introduced in this subsection, provides the 
answer to this question. 

The Cantor set also gives counterexamples to other reasonable conjectures. For 
example, Exercise 17 in this section shows that the sum of two sets with Lebesgue 
measure 0 can have positive Lebesgue measure. 
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2.74 Definition Cantor set 


The Cantor set C is [0,1] \ (U1 Gn), where G; = (5, 3) and Gy for n > 1 is 
the union of the middle-third open intervals in the intervals of [0,1] \ Gz Gj). 


One way to envision the Cantor set C is to start with the interval [0,1] and then 
consider the process that removes at each step the middle-third open intervals of all 
intervals left from the previous step. At the first step, we remove G; = ( i, 3). 


Gy is shown in red. 


After that first step, we have [0,1] \ G = [0,3] U [3,1]. Thus we take the 
middle-third open intervals of [0, 4] and [3,1]. In other words, we have 


Gy, U G2 is shown in red. 


Now [0,1] \ (Gy U Gz) = [0, §] U [§, 3] U [3, 3] U [§, 1]. Thus 


=7fli 2 7 8 19 20 25 26 
G3 (59, 37) U (397 39) U (ap, 57) U (55, 59). 


0it21 231 19 20 7 8 25 264 
27 27 9 9 27 27 3 3 27 27 9 9 27 27 


Gy U G2 U Gz is shown in red. 


Base 3 representations provide a useful way to think about the Cantor set. Just 
as th = 0.1 = 0.09999... in the decimal representation, base 3 representations 
are not unique for fractions whose denominator is a power of 3. For example, 
5 = 0.13 = 0.02222 ...3, where the subscript 3 denotes a base 3 representations. 

Notice that G; is the set of numbers in [0,1] whose base 3 representations have 
1 in the first digit after the decimal point (for those numbers that have two base 3 
representations, this means both such representations must have 1 in the first digit). 
Also, G; U Gz is the set of numbers in [0,1] whose base 3 representations have 1 in 
the first digit or the second digit after the decimal point. And so on. Hence U??_1 Gu 
is the set of numbers in [0, 1] whose base 3 representations have a 1 somewhere. 

Thus we have the following description of the Cantor set. In the following 
result, the phrase a base 3 representation indicates that if a number has two base 3 
representations, then it is in the Cantor set if and only if at least one of them contains 
no 1s. For example, both 5 (which equals 0.02222 . . .3 and equals 0.13) and 3 (which 
equals 0.23 and equals 0.12222 ...3) are in the Cantor set. 
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2.75 base 3 description of the Cantor set 


The Cantor set C is the set of numbers in [0,1] that have a base 3 representation 
containing only Os and 2s. 


The two endpoints of each interval in 
each G,, are in the Cantor set. However, 
many elements of the Cantor set are not 
endpoints of any interval in any Gy. For 
example, Exercise 14 asks you to show 
that i and i are in the Cantor set; neither 


of those numbers is an endpoint of any interval in any G,,. An example of an irrational 
foe) 


number in the Cantor set is > —. 


It is unknown whether or not every 
number in the Cantor set is either 
rational or transcendental (meaning 
not the root of a polynomial with 
integer coefficients). 


n=1 
The next result gives some elementary properties of the Cantor set. 


2.76 Cis closed, has measure 0, and contains no nontrivial intervals 


(a) The Cantor set is a closed subset of R. 


(b) The Cantor set has Lebesgue measure 0. 


(c) The Cantor set contains no interval with more than one element. 


Proof Each set Gy used in the definition of the Cantor set is a union of open intervals. 
Thus each Gy, is open. Thus U??_, Gn is open, and hence its complement is closed. 
The Cantor set equals [0,1] 1 (R \ Up; Gn). which is the intersection of two closed 
sets. Thus the Cantor set is closed, completing the proof of (a). 

By induction on n, each G, is the union of 2”~! disjoint open intervals, each of 


which has length a Thus |G,,| = as The sets G1, Go,... are disjoint. Hence 


_ 1.2. 2 
1 2 4 
= (it a0 3 ) 
i. 4 
Ve ha 


Thus the Cantor set, which equals (0, 1] \ Saar Gy, has Lebesgue measure 1 — 1 [by 
2.57(b)]. In other words, the Cantor set has Lebesgue measure 0, completing the 
proof of (b). 

A set with Lebesgue measure 0 cannot contain an interval that has more than one 
element. Thus (b) implies (c). 
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Now we can define an amazing function. 


2.77 Definition Cantor function 


The Cantor function A: {0,1] — [0,1] is defined by converting base 3 represen- 
tations into base 2 representations as follows: 


e If x € C, then A(x) is computed from the unique base 3 representation of 
x containing only Os and 2s by replacing each 2 by 1 and interpreting the 
resulting string as a base 2 number. 


e If x € [0,1] \ C, then A(x) is computed from a base 3 representation of x 
by truncating after the first 1, replacing each 2 before the first 1 by 1, and 
interpreting the resulting string as a base 2 number. 


2.78 Example values of the Cantor function 


e A(0.02023) = 0.0101,; in other words, A(#) = z. 
A(0.2201213) = 0.11019; in other words A($8) = B. 


Suppose x € (3, z). Then x ¢ C because x was removed in the first step of 
the definition of the Cantor set. Each base 3 representation of x begins with 0.1. 
Thus we truncate and interpret 0.1 as a base 2 number, getting 5 Hence the 
Cantor function A has the constant value i on the interval (3, 2) , as Shown on 
the graph below. 


Suppose x € (G, 8). Then x ¢ C because x was removed in the second step 
of the definition of the Cantor set. Each base 3 representation of x begins with 
0.21. Thus we truncate, replace the 2 by 1, and interpret 0.11 as a base 2 number, 
getting 2 Hence the Cantor function A has the constant value ; on the interval 


(3, 8) , as shown on the graph below. 


aH Ble lO Ni Ol BIW OND p.Y 


So ——— es 
1 27 81 19 
9 9 27 273 3° 27 27 
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As shown in the next result, in some mysterious fashion the Cantor function 
manages to map [0,1] onto [0,1] even though the Cantor function is constant on each 
open interval in the complement of the Cantor set—see the graph in Example 2.78. 


2.79 Cantor function 


The Cantor function A is a continuous, increasing function from [0,1] onto [0, 1]. 
Furthermore, A(C) = [0,1]. 


Proof We begin by showing that A(C) = [0,1]. To do this, suppose y € [0,1]. In 
the base 2 representation of y, replace each 1 by 2 and interpret the resulting string in 
base 3, getting a number x € [0,1]. Because x has a base 3 representation consisting 
only of Os and 2s, the number x is in the Cantor set C. The definition of the Cantor 
function shows that A(x) = y. Thus y € A(C). Hence A(C) = [0,1], as desired. 

Some careful thinking about the meaning of base 3 and base 2 representations and 
the definition of the Cantor function shows that A is an increasing function. This step 
is left to the reader. 

If x € [0,1] \ C, then the Cantor function A is constant on an open interval 
containing x and thus A is continuous at x. If x € C, then again some careful 
thinking about base 3 and base 2 representations shows that A is continuous at x. 

Alternatively, you can skip the paragraph above and note that an increasing 
function on [0,1] whose range equals [0,1] is automatically continuous (although 
you should think about why that holds). 


Now we can use the Cantor function to show that the Cantor set is uncountable 
even though it is a closed set with outer measure 0. 


2.80 Cis uncountable 


The Cantor set is uncountable. 


Proof If C were countable, then A(C) would be countable. However, 2.79 shows 
that A(C) is uncountable. 


As we see in the next result, the Cantor function shows that even a continuous 
function can map a set with Lebesgue measure 0 to nonmeasurable sets. 


2.81 continuous image of a Lebesgue measurable set can be nonmeasurable 


There exists a Lebesgue measurable set A C [0,1] such that |A| = 0 and A(A) 
is not a Lebesgue measurable set. 


Proof Let E be a subset of [0,1] that is not Lebesgue measurable (the existence 

of such a set follows from the discussion after 2.72). Let A = CM A~1(E). Then 

|A| = 0 because A C C and |C| = 0 (by 2.76). Thus A is Lebesgue measurable 

because every subset of R with Lebesgue measure 0 is Lebesgue measurable. 
Because A maps C onto [0,1] (see 2.79), we have A(A) = E. 


60 


Chapter 2 Measures 


EXERCISES 2D 


10 


(a) Show that the set consisting of those numbers in (0,1) that have a decimal 
expansion containing one hundred consecutive 4s is a Borel subset of R. 


(b) What is the Lebesgue measure of the set in part (a)? 


Prove that there exists a bounded set A C R such that |F| < |A| — 1 for every 
closed set F C A. 


Prove that there exists a set A C R such that |G \ A| = 09 for every open set G 
that contains A. 


The phrase nontrivial interval is used to denote an interval of R that contains 
more than one element. Recall that an interval might be open, closed, or neither. 


(a) Prove that the union of each collection of nontrivial intervals of R is the 
union of a countable subset of that collection. 


(b) Prove that the union of each collection of nontrivial intervals of R is a Borel 
set. 


(c) Prove that there exists a collection of closed intervals of R whose union is 
not a Borel set. 


Prove that if A C R is Lebesgue measurable, then there exists an increasing 


sequence F; C Fp C --- of closed sets contained in A such that 
|A\UF| =0. 
k=1 


Suppose A C R and |A| < oo. Prove that A is Lebesgue measurable if and 
only if for every ¢ > 0 there exists a set G that is the union of finitely many 
disjoint bounded open intervals such that |A \ G| + |G\ A] <e. 


Prove that if A C R is Lebesgue measurable, then there exists a decreasing 
sequence G; > Gp D -:- of open sets containing A such that 


(Ns) \ | =i 


Prove that the collection of Lebesgue measurable subsets of R is translation 
invariant. More precisely, prove that if A C R is Lebesgue measurable and 
t € R, thent + A is Lebesgue measurable. 


Prove that the collection of Lebesgue measurable subsets of R is dilation invari- 
ant. More precisely, prove that if A C R is Lebesgue measurable and t € R, 
then fA (which is defined to be {ta : a € A}) is Lebesgue measurable. 


Prove that if A and B are disjoint subsets of R and B is Lebesgue measurable, 
then |A U B| = |A| + |B]. 


11 


12 


13 


14 


15 
16 


17 
18 


19 


20 


21 


22 


23 


24 
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Prove that if A C R and |A| > 0, then there exists a subset of A that is not 
Lebesgue measurable. 


Suppose b < cand A C (b,c). Prove that A is Lebesgue measurable if and 
only if |A| + |(b,c)\ A] =c-—b. 


Suppose A C R. Prove that A is Lebesgue measurable if and only if 
|(—n,n) Al +|(—n,n)\ Al = 2n 
for every n € Zt. 


Show that i and a are both in the Cantor set. 


Show that % is not in the Cantor set. 


List the eight open intervals whose union is Gy in the definition of the Cantor 
set (2.74). 


Let C denote the Cantor set. Prove that 4C + $C = [0,1]. 


Prove that every open interval of R contains either infinitely many or no elements 
in the Cantor set. 


1 
Evaluate | A, where A is the Cantor function. 
0 


Evaluate each of the following: 


(a) A(z); 
(b) A(0.93). 


Find each of the following sets: 

(a) A7'({3}); 

(b) Av'({75})- 

(a) Suppose x is a rational number in [0,1]. Explain why A(x) is rational. 
(b) Suppose x € C is such that A(x) is rational. Explain why x is rational. 


Show that there exists a function f: R — R such that the image under f of 
every nonempty open interval is R. 


For A C R, the quantity 
sup{|F| : F is a closed bounded subset of R and F c A} 
is called the inner measure of A. 


(a) Show that if A is a Lebesgue measurable subset of R, then the inner measure 
of A equals the outer measure of A. 


(b) Show that inner measure is not a measure on the o-algebra of all subsets 
of R. 
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2E Convergence of Measurable Functions 


Recall that a measurable space is a pair (X,S), where X is a set and S is a g-algebra 
on X. We defined a function f: X > R to be S-measurable if f~!(B) € S for 
every Borel set B C R. In Section 2B we proved some results about S-measurable 
functions; this was before we had introduced the notion of a measure. 

In this section, we return to study measurable functions, but now with an emphasis 
on results that depend upon measures. The highlights of this section are the proofs of 
Egorov’s Theorem and Luzin’s Theorem. 


Pointwise and Uniform Convergence 
We begin this section with some definitions that you probably saw in an earlier course. 
2.82 Definition pointwise convergence; uniform convergence 


Suppose X is a set, f1, fo,... is a sequence of functions from X to R, and f is a 
function from X to R. 


e The sequence fi, f2,... converges pointwise on X to f if 


lim fel) = f(x) 


foreach x € X. 


In other words, f1, f2,... converges pointwise on X to f if for each x € X 
and every € > 0, there exists n € Z* such that | f,(x) — f(x)| < € for all 
integers k > n. 


The sequence f;, f2,... converges uniformly on X to f if for every ¢ > 0, 
there exists n € Z* such that | f(x) — f(x)| < € for all integers k > n and 
allx € X. 


2.83 Example a sequence converging pointwise but not uniformly 


Suppose f;: [—1,1] > R is the 2 
function whose graph is shown here 
and f: [—1,1] — R is the function 


defined by 
1 - 
1 ifx £0, 
yo 

P(x) . ifx = 0. 
Then f1, f2,... converges pointwise : zi " 1 ‘ 
on {[—1,1] to f but fy, fo,... does t k k : 
not converge uniformly on {—1, 1] to The graph of fr. 


fas you should verify. 
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Like the difference between continuity and uniform continuity, the difference 
between pointwise convergence and uniform convergence lies in the order of the 
quantifiers. Take a moment to examine the definitions carefully. If a sequence of 
functions converges uniformly on some set, then it also converges pointwise on the 
same set; however, the converse is not true, as shown by Example 2.83. 

Example 2.83 also shows that the pointwise limit of continuous functions need not 
be continuous. However, the next result tells us that the uniform limit of continuous 
functions is continuous. 


2.84 uniform limit of continuous functions is continuous 


Suppose B C R and fj, f2,... is a sequence of functions from B to R that 


converges uniformly on B to a function f: B — R. Suppose b € B and f; is 
continuous at b for each k € Z*+. Then f is continuous at b. 


Proof Suppose ¢ > 0. Let n € Z* be such that | fn(x) — f(x)| < § forall x € B. 
Because f,, is continuous at b, there exists 6 > 0 such that | fn(x) — fn(b)| < 5 for 
allx € (b—6,b+6)MB. 

Now suppose x € (b — 6,b +6) MB. Then 


Lf(x) — f(D) SIF) — fa) + Ufa) — fab) + [fn (b) — F()| 


<€. 


Thus f is continuous at b. 


Egorov’s Theorem 


A sequence of functions that converges 
pointwise need not converge uniformly. 
However, the next result says that a point- 
wise convergent sequence of functions on 
a measure space with finite total measure 
almost converges uniformly, in the sense that it converges uniformly except on a set 
that can have arbitrarily small measure. 

As an example of the next result, consider Lebesgue measure A on the inter- 
val [—1,1] and the sequence of functions fj, fo,... in Example 2.83 that con- 
verges pointwise but not uniformly on [—1,1]. Suppose « > 0. Then taking 
E = [-1,—4]U [4,1], we have A([-1,1] \ E) < e and fi, fo,... converges uni- 
formly on E, as in the conclusion of the next result. 


2.85 Egorov’s Theorem 


Suppose (X,S,j/) is a measure space with p(X) < oo. Suppose fy, fo,... isa 


Dmitri Egorov (1869-1931) proved 
the theorem below in 1911. You may 


encounter some books that spell his 
last name as Egoroff. 


sequence of S-measurable functions from X to R that converges pointwise on 
X to a function f : X — R. Then for every ¢ > 0, there exists a set E € S such 
that w(X \ E) < eand f;, fo,... converges uniformly to f on E. 
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Proof Suppose € > 0. Temporarily fix n € Z*. The definition of pointwise 
convergence implies that 


2.86 U (){x eX: lx) - f(x)| < F} =X. 
m=1k=m 
For m € Z*, let 


Amn = () {x © X= |fe(x) — f(x)| < Gh: 
k 


=m 


Then clearly Aj, C Az, C --- is an increasing sequence of sets and 2.86 can be 
rewritten as 
foe} 
lL) Ane 
m=1 


The equation above implies (by 2.59) that limyn— oo (Amn) = p(X). Thus there 
exists m, € Zt such that 


€ 
2.87 u(X) — u(Am,,n) < oT 
Now let - 
Bl Ae 
n=1 
Then 


wX\E) = 2(X\ P| Amn) 


n=1 


= »(U (X \ Ann,n)) 


n=1 
< y U(X \ Amn,n) 
n=1 
<6, 


where the last inequality follows from 2.87. 
To complete the proof, we must verify that f;, fo,... converges uniformly to f 
on E. To do this, suppose e’ > 0. Letn € Z* be such that i <2, Then EC Ay, a, 


which implies that 
[fe(x) — FQ) <q <el 


for all k > my and all x € E. Hence fj, fo,... does indeed converge uniformly to f 
on E. 
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Approximation by Simple Functions 


2.88 Definition simple function 


A function is called simple if it takes on only finitely many values. 


Suppose (X,S) is a measurable space, f: X — R is a simple function, and 
C1,-..,Cy are the distinct nonzero values of f. Then 


f cing, to bene, 


where Ex = f—!({cx}). Thus this function f is an S-measurable function if and 
only if E,,...,E, € S (as you should verify). 


2.89 approximation by simple functions 


Suppose (X,S) is a measure space and f: X — [—oco, oo] is S-measurable. 
Then there exists a sequence f;, f2,... of functions from X to R such that 


(a) each f; is a simple S-measurable function; 


(b) |fk(x)| < Lfeer(x)| < [f(x)| for all k € Z* and all x € X; 


(c) jim fix) — fe) torevery x < XK; 
— 00 


(d) f1,f2,... converges uniformly on X to f if f is bounded. 


Proof The idea of the proof is that for each k € Z* and n € Z, the interval 
[n,n +1) is divided into 2" equally sized half-open subintervals. If f(x) € [0,k), 
we define f;(x) to be the left endpoint of the subinterval into which f (x) falls; if 
f(x) € (—k,0), we define f;,(x) to be the right endpoint of the subinterval into 
which f(x) falls; and if | f(x)| > k, we define f;,(x) to be Ek. Specifically, let 


if 0 < f(x) <kandm € Z is such that f(x) € [%, ™f*), 
Alx) mel if —k < f(x) < Oand m € Zis such that f(x) € [%, “*), 
k — 

k #fQ)2Sk 


—k if f(x) <—k. 


Each f—! (LF. mrt)) € S because f is an S-measurable function. Thus each fx 
is an S-measurable simple function; in other words, (a) holds. 
Also, (b) holds because of how we have defined f;. 
The definition of f; implies that 
2.90 fete) =F Ga) = e: for all x € X such that f(x) € [—k,k]. 
Thus we see that (c) holds. 
Finally, 2.90 shows that (d) holds. 
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Luzin’s Theorem 


Our next result is surprising. It says that 
an arbitrary Borel measurable function is 
almost continuous, in the sense that its 
restriction to a large closed set is contin- 
uous. Here, the phrase large closed set 
means that we can take the complement 
of the closed set to have arbitrarily small 
measure. 

Be careful about the interpretation of 
the conclusion of Luzin’s Theorem that f |, is a continuous function on B. This is 
not the same as saying that f (on its original domain) is continuous at each point 
of B. For example, Xo is discontinuous at every point of R. However, XglR\Q isa 


Nikolai Luzin (1883-1950) proved 
the theorem below in 1912. Most 
mathematics literature in English 
refers to the result below as Lusin’s 


Theorem. However, Luzin is the 
correct transliteration from Russian 
into English; Lusin is the 
transliteration into German. 


continuous function on R \ Q (because this function is identically 0 on its domain). 


2.91 Luzin’s Theorem 


Suppose g: R — R is a Borel measurable function. Then for every e > 0, there 


exists a closed set F C R such that |R \ F| < € and g|r is a continuous function 
on F. 


Proof First consider the special case where g = dj es ae di an for some 
distinct nonzero d1,...,d, © R and some disjoint Borel sets D,,...,D, C R. 
Suppose € > 0. For eachk € {1,...,n}, there exist (by 2.71) aclosed set Fy C Dx 
and an open set Gz D Dx such that 


€ E 
G; \ D —— d |D.\F — 
|G, \ <5, and |Dx \ tl <5, 


Because Gy \ Fy = (Gy \ Dy) U (Dy \ Fe), we have |G, \ F,| < £ for each k € 
{1,...,n}. 
Let 


F= (U Fx) U (\(R \ G;). 
k=1 k=1 


Then F is a closed subset of R and R \ F = Uy_, (Gx \ Fy). Thus |R \ F| < e. 
Because F, C Dx, we see that g is identically dy on Fy. Thus g|p, is continuous 
for each k € {1,...,n}. Because 


n n 


()(R\ Ge) C (F(R \ Dg), 


k=1 k=1 


we see that g is identically 0 on (\~_,(R \ Gg). Thus g|q_,(R\G,) is continuous. 
Putting all this together, we conclude that g|r is continuous (use Exercise 9 in this 
section), completing the proof in this special case. 

Now consider an arbitrary Borel measurable function g: R — R. By 2.89, there 
exists a sequence 91, 90,... of functions from R to R that converges pointwise on R 
to g, where each g; is a simple Borel measurable function. 
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Suppose € > 0. By the special case already proved, for each k € Z*, there exists 
a closed set C, C R such that |R \ Cy] < 5227 and gy|c, is continuous. Let 


C=) Ce 
k=1 


Thus C is aclosed set and 9;|c is continuous for every k € Z*. Note that 


R\C=U(R\G); 
k=1 
thus |R \ C| < 5. 

For each m € Z, the sequence $11 (man-1)7 82 (mms) + ... converges pointwise 
on (m,m +1) to 8\(mn41)- Thus by Egorov’s Theorem (2.85), for each m € Z, 
there is a Borel set E,, C (m,m +1) such that 81, 82,-.. converges uniformly to g 
on E,, and 


€ 
\(m,m-+1)\ Enl < Sars: 


Thus 91, 92,... converges uniformly to g on CM E, for each m € Z. Because each 
gic is continuous, we conclude (using 2.84) that g|cnpg,, is continuous for each 
m € Z. Thus g|p is continuous, where 


D= LI) (Co Ea). 
meZ 


Because 


R\DczZzu(U ((m,m +1) \ Em)) U(R\C), 


meZ 


we have |R \ D| < «. 
There exists a closed set F C D such that |D \ F| < ¢ — |R \ D| (by 2.65). Now 


IR \ F| = |(R\ D) U(D\F)| < IR\ DI +|D\ Fl <e. 


Because the restriction of a continuous function to a smaller domain is also continuous, 
g|F is continuous, completing the proof. 


We need the following result to get another version of Luzin’s Theorem. 


2.92 continuous extensions of continuous functions 


e Every continuous function on a closed subset of R can be extended to a 
continuous function on all of R. 


e More precisely, if F C R is closed and g: F — R is continuous, then there 
exists a continuous function h: R — R such that h|p = g. 


Proof Suppose F C R is closed and g: F — R is continuous. Thus R \ F is the 
union of a collection of disjoint open intervals {I,}. For each such interval of the 
form (a, 00) or of the form (—oo,a), define h(x) = g(a) for all x in the interval. 
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For each interval I, of the form (b,c) with b < cand b,c € R, define h on [b,c] 
to be the linear function such that h(b) = g(b) and h(c) = g(c). 

Define h(x) = g(x) for all x € R for which h(x) has not been defined by the 
previous two paragraphs. Then 1: R — R is continuous and h|p = g. 


The next result gives a slightly modified way to state Luzin’s Theorem. You can 
think of this version as saying that the value of a Borel measurable function can be 
changed on a set with small Lebesgue measure to produce a continuous function. 


2.93 Luzin’s Theorem, second version 


Suppose E C Rand g: E — Risa Borel measurable function. Then for every 


€ > 0, there exists a closed set F C E and a continuous function h: R > R such 
that |E \ F| < eand hp = gp. 


Proof Suppose e > 0. Extend g to a function ¢: R — R by defining 


a g(x) ifx € E, 
0 ifxeR\E. 


By the first version of Luzin’s Theorem (2.91), there is a closed set C C R such 
that |R \ C| < ¢ and &|c is a continuous function on C. There exists a closed set 
F C CNE such that |(CM E) \ F| < e— |R\C| (by 2.65). Thus 


IE\ F| < |((CNE)\ F) U(R\ ©)| < |(CNE)\ FF IR\ Cl <e. 


Now &|F is a continuous function on F. Also, &|- = g|r (because F C E) . Use 
2.92 to extend ¢|p to a continuous function h: R > R. 


The building at Moscow State University where the mathematics seminar organized 
by Egorov and Luzin met. Both Egorov and Luzin had been students at Moscow State 
University and then later became faculty members at the same institution. Luzin’s 
PAD thesis advisor was Egorov. 
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Lebesgue Measurable Functions 


2.94 Definition Lebesgue measurable function 


A function f: A + R, where A C R, is called Lebesgue measurable if f—'(B) 
is a Lebesgue measurable set for every Borel set B C R. 


If f: A —> Risa Lebesgue measurable function, then A is a Lebesgue measurable 
subset of R [because A = f =e (R)]. If A is a Lebesgue measurable subset of R, then 
the definition above is the standard definition of an S-measurable function, where S 
is a g-algebra of all Lebesgue measurable subsets of A. 

The following list summarizes and reviews some crucial definitions and results: 


e A Borel set is an element of the smallest c-algebra on R that contains all the 
open subsets of R. 


e A Lebesgue measurable set is an element of the smallest c-algebra on R that 
contains all the open subsets of R and all the subsets of R with outer measure 0. 


e The terminology Lebesgue set would make good sense in parallel to the termi- 
nology Borel set. However, Lebesgue set has another meaning, so we need to 
use Lebesgue measurable set. 


Every Lebesgue measurable set differs from a Borel set by a set with outer 
measure 0. The Borel set can be taken either to be contained in the Lebesgue 
measurable set or to contain the Lebesgue measurable set. 


e Outer measure restricted to the o-algebra of Borel sets is called Lebesgue 
measure. 


e Outer measure restricted to the o-algebra of Lebesgue measurable sets is also 
called Lebesgue measure. 


e Outer measure is not a measure on the o-algebra of all subsets of R. 


A function f: A — R, where A C R, is called Borel measurable if f~!(B) isa 
Borel set for every Borel set B C R. 


A function f: A — R, where A C R, is called Lebesgue measurable if f~!(B) 
is a Lebesgue measurable set for every Borel set B C R. 


Although there exist Lebesgue measur- 
able sets that are not Borel sets, you are 
unlikely to encounter one. Similarly, a 
Lebesgue measurable function that is not 
Borel measurable is unlikely to arise in 
anything you do. A great way to simplify 
the potential confusion about Lebesgue 
measurable functions being defined by in- 
verse images of Borel sets is to consider 
only Borel measurable functions. 


“Passing from Borel to Lebesgue 
measurable functions is the work of 
the devil. Don’t even consider it!” 
—Barry Simon (winner of the 
American Mathematical Society 


Steele Prize for Lifetime 
Achievement), in his five-volume 
series A Comprehensive Course in 
Analysis 
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The next result states that if we adopt 
the philosophy that what happens on a 
set of outer measure 0 does not matter 
much, then we might as well restrict our 
attention to Borel measurable functions. 


“He professes to have received no 
sinister measure.” 


— Measure for Measure, 
by William Shakespeare 


2.95 every Lebesgue measurable function is almost Borel measurable 


Suppose f: R — Risa Lebesgue measurable function. Then there exists a Borel 


measurable function g: R — R such that 


{x ER: g(x) # f(x)}| =0. 


Proof There exists a sequence f1, f2,... of Lebesgue measurable simple functions 
from R to R converging pointwise on R to f (by 2.89). Suppose k € Z*. Then there 
exist Cj,...,Cn € R and disjoint Lebesgue measurable sets Aj,..., Ay C R such 
that 


ik= (1X 4, ei + enX 
For each j € {1,...,1}, there exists a Borel set B; C Aj such that | A; \ B;| = 0 
[by the equivalence of (a) and (d) in 2.71]. Let 

8k = C1Xp, +7 + CnXz 


Then gj is a Borel measurable function and |{x € R: gx(x) A fx(x)}| = 0. 


Ifx ¢ Uy {x © R: g(x) A fi (x)}, then 94 (x) = fi,(x) for all k © Zt and 
hence limg 00 9% (x) = f(x). Let 


E={zeR: jim g(x) exists in R}. 
—00 


Then E is a Borel subset of R [by Exercise 14(b) in Section 2B]. Also, 


R\EC U {xe R: g(x) A fie(x)} 


k=1 


and thus |R \ E| = 0. For x € R, let 
2.96 g(x) = lim (x,8%) (x). 
k-00 


If x € E, then the limit above exists by the definition of E; if x € R \ E, then the 
limit above exists because (7.8%) (x) = 0 for all k € Z*. 

For each k € Z*, the function X 8k is Borel measurable. Thus 2.96 implies that g 
is a Borel measurable function (by 2.48). Because 


foe) 


{x ER: g(x) A f(x)} C U{x ER: gel) A fe(x)} 


k=1 


we have |{x € R: g(x) A f(x)}| = 0, completing the proof. 
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EXERCISES 2E 


1 Suppose X is a finite set. Explain why a sequence of functions from X to R that 
converges pointwise on X also converges uniformly on X. 


2 Give an example of a sequence of functions from Z™ to R that converges 
pointwise on Z* but does not converge uniformly on Z*. 


3 Give an example of a sequence of continuous functions f1, f2,... from [0,1] 
to R that converge pointwise to a function f: [0,1] — R that is not a bounded 
function. 


4 Prove or give a counterexample: If A C R and fj, f2,... is a sequence of 
uniformly continuous functions from A to R that converge uniformly to a 
function f: A — R, then f is uniformly continuous on A. 


5 Give an example to show that Egorov’s Theorem can fail without the hypothesis 
that p(X) < oo. 


6 Suppose (X,S, 1) is a measure space with p(X) < co. Suppose fi, fo,... isa 
sequence of S-measurable functions from X to R such that limy 5.5 f(x) = 09 
for each x € X. Prove that for every ¢ > 0, there exists a set E € S such that 
u(X \ E) < eand fy, fo,... converges uniformly to co on E (meaning that for 
every t > 0, there exists n € Z* such that f,(x) > ¢ for all integers k > n and 
all x € E). 

[The exercise above is an Egorov-type theorem for sequences of functions that 
converge pointwise to 00. ] 


7 Suppose F is a closed bounded subset of R and gj,92,... iS an increasing 
sequence of continuous real-valued functions on F (thus 91(x) < go(x) <--- 
for all x € F) such that sup{g1(x),92(x),...} < co for each x € F. Define a 
real-valued function g on F by 

8(x) = Tim gx(x). 
k—-400 

Prove that g is continuous on F if and only if 21, 92,... converges uniformly on 
F tog. 


[The result above is called Dini’s Theorem. | 
8 Suppose p is the measure on (Z*, 22") defined by 


HE) = Doe 


neE 


Prove that for every € > 0, there exists a set E C Zt with p(Zt \E) <e 
such that f1, fo,... converges uniformly on E for every sequence of functions 
fi, fo,... from Z* to R that converges pointwise on Z*. 

[This result does not follow from Egorov’s Theorem because here we are asking 
for E to depend only on ¢. In Egorov’s Theorem, E depends on € and on the 


sequence fy, fo,....] 
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Suppose F,,...,F, are disjoint closed subsets of R. Prove that if 
g: F,U---UR-R 


is a function such that g|f, is a continuous function for each k € {1,...,n}, 
then g is a continuous function. 


Suppose F C R is such that every continuous function from F to R can be 
extended to a continuous function from R to R. Prove that F is a closed subset 
of R. 


Prove or give a counterexample: If F C R is such that every bounded continuous 
function from F to R can be extended to a continuous function from R to R, 
then F is a closed subset of R. 


Give an example of a Borel measurable function f from R to R such that there 
does not exist a set B C R such that |R \ B| = 0 and f|g is a continuous 
function on B. 


Prove or give a counterexample: If f;: R — R is a Borel measurable function 
for each t € Rand f: R — (—co, oo] is defined by 


f(x) = supt{fi(x) :t € R}, 
then f is a Borel measurable function. 
Suppose bj, bz,... is a sequence of real numbers. Define f: R — [0,0] by 
a ifx ¢ {b1, b2,...}, 
f(x) = 4 ea ¥ lx — bal 
oo ifxe {by,bo,...}. 
Prove that |{x € R: f(x) < 1}| =oo. 


[This exercise is a variation of a problem originally considered by Borel. If 
by, b2,... contains all the rational numbers, then it is not even obvious that 


{x ER: f(x) << oo} FO] 


Suppose B is a Borel set and f: B — R is a Lebesgue measurable function. 
Show that there exists a Borel measurable function g: B — R such that 


|{x € B: g(x) A f(x)}| =0. 
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Chapter 3 inne 
Integration 


To remedy deficiencies of Riemann integration that were discussed in Section 1B, 
in the last chapter we developed measure theory as an extension of the notion of the 
length of an interval. Having proved the fundamental results about measures, we are 
now ready to use measures to develop integration with respect to a measure. 

As we will see, this new method of integration fixes many of the problems with 


Riemann integration. In particular, we will develop good theorems for interchanging 
limits and integrals. 


4 Ri: 


Statue in Milan of Maria Gaetana Agnesi, 
who in 1748 published one of the first calculus textbooks. 

A translation of her book into English was published in 1801. 
In this chapter, we develop a method of integration more powerful 
than methods contemplated by the pioneers of calculus. 
©Giovanni Dall’Orto 


© Sheldon Axler 2020 
S. Axler, Measure, Integration & Real Analysis, Graduate Texts 73 
in Mathematics 282, https://doi.org/10.1007/978-3-030-33 143-6 3 
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3A_ Integration with Respect to a Measure 


Integration of Nonnegative Functions 


We will first define the integral of a nonnegative function with respect to a measure. 
Then by writing a real-valued function as the difference of two nonnegative functions, 
we will define the integral of a real-valued function with respect to a measure. We 
begin this process with the following definition. 


1 Definition S-partition 


Suppose S is a g-algebra on a set X. An S-partition of X is a finite collection 
Aj,..-,Am of disjoint sets in S such that Ay U---U Am = X. 


The next definition should remind you 
of the definition of the lower Riemann 
sum (see 1.3). However, now we are 
working with an arbitrary measure and 
thus X need not be a subset of R. More importantly, even in the case when X is a 
closed interval [a, b| in R and ji is Lebesgue measure on the Borel subsets of [a,b], 
the sets Aj,...,Am in the definition below do not need to be subintervals of [a, b] as 
they do for the lower Riemann sum—they need only be Borel sets. 


We adopt the convention that 0 - ce 
and © -0 should both be interpreted 
to be 0. 


3.2 Definition lower Lebesgue sum 


Suppose (X,S,) is a measure space, f: X — [0,00] is an S-measurable 
function, and P is an S-partition A,,..., Am of X. The lower Lebesgue sum 
L(f, P) is defined by 


m 


L(f,P) = Yo wlAj)intf. 


=k 


Suppose (X,S,}1) is a measure space. We will denote the integral of an S- 
measurable function f with respect to p by | f dy. Our basic requirements for 
an integral are that we want [ x, dy to equal u(E) for all E € S, and we want 
J(f+g)du= f fdut f gdp. As we will see, the following definition satisfies 
both of those requirements (although this is not obvious). Think about why the 
following definition is reasonable in terms of the integral equaling the area under the 
graph of the function (in the special case of Lebesgue measure on an interval of R). 


3.3 Definition integral of a nonnegative function 


Suppose (X,S, 4) is a measure space and f: X — [0,00] is an S-measurable 
function. The integral of f with respect to 1, denoted | f dy, is defined by 


[fen = sup{L(f,P) : P is an S-partition of X}. 


Section 3A Integration with Respect to a Measure 75 


Suppose (X,S, 1) is a measure space and f: X — [0,00] is an S-measurable 
function. Each S-partition A;,...,Aj of X leads to an approximation of f from 
below by the S-measurable simple function nae (inf f ) X,,- This suggests that 

j J 


3 WAj)intf 


j=l 


should be an approximation from below of our intuitive notion of fi f du. Taking the 
supremum of these approximations leads to our definition of f f du. 
The following result gives our first example of evaluating an integral. 


3.4 integral of a characteristic function 


Suppose (X,S, j1) is a measure space and E € S. Then 


[xc = p(E). 


Proof If P is the S-partition of X con- 
sisting of E and its complement X \ E, fi f du hag.no independent meaning, 


nencsany £(X_,P) = HE) Thus but it often usefully separates f from 


ie du > H(E). a py. Because the din J f dp does not 
To prove the inequality in the other 


direction, suppose P is an S-partition 
Aj,---,Am of X. Then p(A}) int x, 
i 


The symbol d in the expression 


represent another object, some 
mathematicians prefer typesetting 
an upright d in this situation, 


producing J fp. However, the 
upright d looks jarring to some 
readers who are accustomed to 


equals #(A;) if Aj C E and equals 0 
otherwise. Thus 


L(x,,P) = ys u(Aj) italicized symbols. This book takes 
{j: AjCE} the compromise position of using 
slanted d instead of math-mode 
= ( U Aj) italicized d in integrals. 
{7 AjCE} 
< HE) 


Thus { x, dy < #(E), completing the proof. 


3.5 Example integrals of Xe and X10,11\0 

Suppose A is Lebesgue measure on R. As a special case of the result above, we 
have if Xe dA = 0 (because |Q| = 0). Recall that Xo is not Riemann integrable on 
[0,1]. Thus even at this early stage in our development of integration with respect to 
a measure, we have fixed one of the deficiencies of Riemann integration. 

Note also that 3.4 implies that J X01 \q dA = 1 (because \[0,1] \ O| = 1), 
which is what we want. In contrast, the lower Riemann integral of Xo,1)\0 8 (0, 1] 
equals 0, which is not what we want. 
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3.6 Example integration with respect to counting measure is summation 


Suppose p/ is counting measure on Z* and bj,b2,... is a sequence of nonnegative 
numbers. Think of b as the function from Z* to [0, 00) defined by b(k) = by. Then 


pou= yo bk, 
k=1 


as you should verify. 


Integration with respect to a measure can be called Lebesgue integration. The 
next result shows that Lebesgue integration behaves as expected on simple functions 
represented as linear combinations of characteristic functions of disjoint sets. 


3.7 integral of a simple function 


Suppose (X,S,}/) is a measure space, E;,...,E, are disjoint sets in S, and 
C1,--+,Cn € [0,00]. Then 


fies gets, du = s cr (Ex). 
k=1 e=il 


Proof Without loss of generality, we can assume that Fy,...,£, is an S-partition of 
X [by replacing n by n + 1 and setting E, 4, = X \ (Ey U... UEn) and cy, 41 = 0). 
If P is the S-partition E,,...,E, of X, then £(L_, CkXp,rP) =) 4 tee ep): 


Thus : P 
| (Loewe) de> Yo axnlEe)- 
k=1 k=1 


To prove the inequality in the other direction, suppose that P is an S-partition 
Aj,...,Am of X. Then 


ms CkX g, i) =). #(A;) 
k=1 = 


min Ci 
{i: AjNE; AO} 


= (Aj M Ex) jon 
bai {i: AjNE; 40} 


k 
The inequality above implies that i 
the proof. 


— 


aS CkX g,) du < VL, ceu(Ex), completing 
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The next easy result gives an unsurprising property of integrals. 


3.8 integration is order preserving 


Suppose (X,S, j/) is a measure space and f,¢: X —> [0,00] are S-measurable 
functions such that f(x) < g(x) for all x € X. Then f fdu < f gdp. 


Proof Suppose P is an S-partition Aj,..., Am of X. Then 
inf f < inf 
ity S ints 
for each j = 1,...,m. Thus L(f,P) < L(g,P). Hence f f du < f gdp. 


Monotone Convergence Theorem 


For the proof of the Monotone Convergence Theorem (and several other results), we 
will need to use the following mild restatement of the definition of the integral of a 
nonnegative function. 


3.9 integrals via simple functions 


Suppose (X,S, 1) isa measure space and f : X — [0,00] is S-measurable. Then 


m 


3.10 [fen = sup{ cj (Aj) : Aj,...,Am are disjoint sets in S, 
j=l 


C1,-++,Cm € [0,00), and 


m 
HE 25 CiX 4,(x) for every x € x}. 
j=l 


Proof First note that the left side of 3.10 is bigger than or equal to the right side by 
3.7 and 3.8. 

To prove that the right side of 3.10 is bigger than or equal to the left side, first 
assume that inf f < for every A € S with p(A) > 0. Then for P an S-partition 


Aj,..-,Am of nonempty subsets of X, take cj = inf f, which shows that £(f, P) is 
j 


in the set on the right side of 3.10. Thus the definition of [ f dy shows that the right 
side of 3.10 is bigger than or equal to the left side. 

The only remaining case to consider is when there exists a set A € S such that 
u(A) > O and inf f = oo [which implies that f(x) = 00 for all x € AJ. In this case, 


for arbitrary f € (0,co) we can take m = 1, Aj = A, and cj = t. These choices 
show that the right side of 3.10 is at least t(A). Because f is an arbitrary positive 
number, this shows that the right side of 3.10 equals oo, which of course is greater 
than or equal to the left side, completing the proof. 


The next result allows us to interchange limits and integrals in certain circum- 
stances. We will see more theorems of this nature in the next section. 


78 Chapter 3 Integration 


3.11 Monotone Convergence Theorem 


Suppose (X,S, 1) is a measure space and 0 < f; < fy < --- is an increasing 
sequence of S-measurable functions. Define f: X — [0, Ae 


f(x) = lim fe(2). 


jim f fan = | fap. 


Proof The function f is S-measurable by 2.53. 

Because f,(x) < f(x) for every x € X, we have f f,du < f f du for each 
k € Z* (by 3.8). Thus limg_,o. f fe du < f f du. 

To prove the inequality in the other direction, suppose Ay,..., Am are disjoint 
sets in S and cy,...,Cm € [0,00) are such that 


Mm 
3.12 Tins eee (x) for every x € X. 
4 J 
Let t € (0,1). Fork € Z*, let 
Ey= {x eX: f(x > Dei x)}. 


Then E; C E> C --- is an increasing sequence of sets in S whose union equals X. 
Thus limo #(Aj A Ex) = (Aj) for each j € {1,...,mm} (by 2.59). 
If k € Z*, then 


Fix > tei, ne, (* ) 


j=l 
for every x € X. Thus (by 3.9) 


m 
| fi du >t Pa M Ex). 
J= 
Taking the limit as k — oo of both sides of the inequality above gives 
m 
lim / du >t) cpa 
jim | fe dp > dX jH(Aj) 
Now taking the limit as t increases to 1 shows that 
m 
li / du>) cu(A 
jim | fed > Xu fiat 


Taking the supremum of the inequality above over all S-partitions A;,..., Am 
of X and all cy,...,Cm € (0, co) satisfying 3.12 shows (using 3.9) that we have 
limpsoo f fe du = J f du, completing the proof. 
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The proof that the integral is additive will use the Monotone Convergence Theorem 
and our next result. The representation of a simple function h: X — [0,0o] in the 
form )p—1 Ck Xe, is not unique. Requiring the numbers cy,...,C, to be distinct and 
E,,...,E, to be nonempty and disjoint with E; U--- UE, = X produces what 
is called the standard representation of a simple function [take E, = h~'({c;}), 
where C1,...,Cy are the distinct values of 1]. The following lemma shows that all 
representations (including representations with sets that are not disjoint) of a simple 
measurable function give the same sum that we expect from integration. 


3.13 integral-type sums for simple functions 


Suppose (X,S, //) is a measure space. Suppose 41,...,4m,01,..-,bn € [0,00] 
and Aj,...,Am,B1,...,Byn € S are such that ei QjX, = a DX g,- Then 
dl 


¥ aju(Aj) = 9 byw (Be). 
j=l k=1 


Proof Weassume Aj U---UA,, = X (otherwise add the term OXx\ 
Suppose A, and Az are not disjoint. Then we can write 


(Ai Us: U Am)" 


3.14 MX 4, +42X 4, = 1X 4, \ Ay FX ay\ ay + (+2) Karr Ay? 


where the three sets appearing on the right side of the equation above are disjoint. 
Now Ay = (A; \ Az) U(A19 Az) and Az = (Az \ Ay) U (A, N Az); each 

of these unions is a disjoint union. Thus #(A1) = (Ai \ Az) + w(A1M Az) and 

(Az) = #(A2 \ Ar) + #(A1 9 Ag). Hence 


ayp(Ay) + agp(Az) = ap (Ay \ Az) + 42H (Az \ Ar) + (a1 + 42) u(A12 Ag). 


The equation above, in conjunction with 3.14, shows that if we replace the two 
sets A;, Az by the three disjoint sets A; \ Az, Az \ A1,A1™ Az and make the 
appropriate adjustments to the coefficients a1,...,4m, then the value of the sum 
Lj=1 ajp(Aj;) is unchanged (although 7m has increased by 1). 

Repeating this process with all pairs of subsets among Aj,...,Am that are 
not disjoint after each step, in a finite number of steps we can convert the ini- 
tial list Ay,...,Am into a disjoint list of subsets without changing the value of 
ei aju(Aj). 

The next step is to make the numbers @1,..., 4m distinct. This is done by replacing 
the sets corresponding to each a; by the union of those sets, and using finite additivity 
of the measure j1 to show that the value of the sum }7"_, 4;1(Aj) does not change. 

Finally, drop any terms for which Aj = ©, getting the standard representation 
for a simple function. We have now shown that the original value of 7)" 4;1(Aj) 
is equal to the value if we use the standard representation of the simple function 


m ss ‘ _ . nN 
j=1 4X Ay The same procedure can be used with the representation )(y_, bx Xp, t0 


show that )°¢_, be u( Xz,) equals what we would get with the standard representation. 
Thus the equality of the functions )("" , ajx A and )i7_1 be Xp, implies the equality 


j=1 4H (Aj) = Lear bee (Be). 


j= 
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Now we can show that our definition 
of integration does the right thing with 
simple measurable functions that might 
not be expressed in the standard represen- 
tation. The result below differs from 3.7 
mainly because the sets Fy,..., E, in the 
result below are not required to be dis- 
joint. Like the previous result, the next 
result would follow immediately from the 
linearity of integration if that property had 
already been proved. 


If we had already proved that 
integration is linear, then we could 
quickly get the conclusion of the 
previous result by integrating both 
sides of the equation 


he 2X4, — eet KX, with 
respect to it. However, we need the 
previous result to prove the next 
result, which is used in our proof 
that integration is linear. 


3.15 integral of a linear combination of characteristic functions 


Suppose (X,S, j1) is a measure space, E),...,En € S, andcy,...,Cn € [0,00]. 


Then ‘ rf 
fos ae) du = Yi" exp (Ex). 
k=1 k=1 


Proof The desired result follows from writing the simple function )°/_, cx Xe, in 
the standard representation for a simple function and then using 3.7 and 3.13. 


Now we can prove that integration is additive on nonnegative functions. 
3.16 additivity of integration 


Suppose (X,S, j/) is a measure space and f,¢: X — [0,00] are S-measurable 


[t+sae=f fant f gap. 


functions. Then 


Proof The desired result holds for simple nonnegative S-measurable functions (by 
3.15). Thus we approximate by such functions. 

Specifically, let f1, fo,... and 91, 9,... be increasing sequences of simple non- 
negative S-measurable functions such that 


lim f(x) = f(x) and tim gy(x) = g(x) 
k- 00 k-00 
for all x € X (see 2.89 for the existence of such increasing sequences). Then 
Ju +8) du = lim [ (fi + 8k) du 
= lim | feau+ lim J sean 
k00 k—00 


=f faut [ gan, 


where the first and third equalities follow from the Monotone Convergence Theorem 
and the second equality holds by 3.15. 
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The lower Riemann integral is not additive, even for bounded nonnegative measur- 
able functions. For example, if f = Xen 0,1] and g = Xo,1)\o then 


L(f,[0,1]) =0 and L(g,[0,1])=0 but Lif +¢,[0,1]) =1. 


In contrast, if A is Lebesgue measure on the Borel subsets of (0, 1], then 


[farqo and [sar=i and [U+g)aa1. 


More generally, we have just proved that [(f +g) du = [ fdu+ f gdp for 
every measure p/ and for all nonnegative measurable functions f and g. Recall that 
integration with respect to a measure is defined via lower Lebesgue sums in a similar 
fashion to the definition of the lower Riemann integral via lower Riemann sums 
(with the big exception of allowing measurable sets instead of just intervals in the 
partitions). However, we have just seen that the integral with respect to a measure 
(which could have been called the lower Lebesgue integral) has considerably nicer 
behavior (additivity!) than the lower Riemann integral. 


Integration of Real-Valued Functions 


The following definition gives us a standard way to write an arbitrary real-valued 
function as the difference of two nonnegative functions. 


3.17 Definition f+; f~ 


Suppose f : X —> [—oo, oo] is a function. Define functions f+ and f~ from X to 
[0, c0] by 


_ J f(x) if f(x) 20, aA eae if f(x) > 0, 
rea {i if f(x) <0 saa Bl ne if f(x) <0. 


Note that if f: X — [—co, 00] is a function, then 
fafa i) wed a io 


The decomposition above allows us to extend our definition of integration to functions 
that take on negative as well as positive values. 


3.18 Definition integral of a real-valued function 


Suppose (X,S, j1) is a measure space and f : X —> [—oo, 0] is an S-measurable 
function such that at least one of { f* du and { f~ dy is finite. The integral of 


f with respect to p, denoted f f dy, is defined by 


[tau= [pian ff ap. 


If f > 0, then f* = f and f~ = 0; thus this definition is consistent with the 
previous definition of the integral of a nonnegative function. 


82 Chapter 3 Integration 


The condition ['|f| dj: < co is equivalent to the condition [ f* dy < co and 
J f7 du < © (because | f| = f* + f7). 


3.19 Example a function whose integral is not defined 
Suppose A is Lebesgue measure on R and f: R — R is the function defined by 


1 ifx > 0, 
feat, ifx <0. 


Then | f dA is not defined because [ f* dA = co and [ f~ dA =o. 


The next result says that the integral of a number times a function is exactly what 
we expect. 


3.20 integration is homogeneous 


Suppose (X,S, j/) is a measure space and f: X —> [—co, co] is a function such 


that [ f dy is defined. If c € R, then 


fefau=e | fay. 


Proof First consider the case where f is a nonnegative function and c > 0. If P is 
an S-partition of X, then clearly L(cf, P) = cL(f,P). Thus [cf du =c f f dy. 

Now consider the general case where f takes values in [—co, co]. Suppose c > 0. 
Then 


fefau= flo au— [lf au 
= foftau— f cf au 
ne([ro [ra 
=e | fay, 


where the third line follows from the first paragraph of this proof. 
Finally, now suppose c < 0 (still assuming that f takes values in [—00, co]). Then 
—c > Oand 


fefan= fety*an— [ (cf) ay 
= [(-0)f- du— (oft an 
=(-c)( ff au f f* ax) 
=e | fay, 


completing the proof. 
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Now we prove that integration with respect to a measure has the additive property 
required for a good theory of integration. 


3.21 additivity of integration 


Suppose (xX, S, H) is a measure space and f,g: X — R are S-measurable 
functions such that [| f| dy < co and ['|g| du < co. Then 


[ursae=f fant f gap. 


Proof Clearly 
Gta =(fts) Sfe 
Sp ae ee 


Thus 


ea ee Sg) eg es 
Both sides of the equation above are sums of nonnegative functions. Thus integrating 
both sides with respect to p and using 3.16 gives 


[urs ant fp aus [dua [(ftsy aut f frau f gt ap. 
Rearranging the equation above gives 

[uta an— fr+s) a = [rian [pau ft au f x ap, 
where the left side is not of the form co — oo because (f + g)t < ft +97 and 
(f+) <f- +87. The equation above can be rewritten as 


Gottfried Leibniz (1646-1716) 
Ju +g) du = [fan ay [sa invented the symbol { to denote 
integration in 1675. 


completing the proof. 


The next result resembles 3.8, but now the functions are allowed to be real valued. 
3.22 integration is order preserving 


Suppose (xX, S, H) is a measure space and f,g: X — R are S-measurable 


functions such that [ f dy and { g dy are defined. Suppose also that f(x) < g(x) 
for all x € X. Then f f du < f gdp. 


Proof The cases where [ f dys = +00 or [ g du = +0e are left to the reader. Thus 
we assume that || f| du < co and ['|g| du < ov. 
The additivity (3.21) and homogeneity (3.20 with c = —1) of integration imply 


that 
[sau [ fau= [(g— fap. 


The last integral is nonnegative because g(x) — f(x) > 0 for all x € X. 
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The inequality in the next result receives frequent use. 


3.23 absolute value of integral < integral of absolute value 


Suppose (X,S, j1) is a measure space and f: X — [—0o, 00] is a function such 


| [ fan] < [flan 


Proof Because { f dy is defined, f is an S-measurable function and at least one of 
J f* duand f{ f~ dyis finite. Thus 


[/ fae =| f fran fra 
= [pt aus fr a 
= [ft +f )au 


= | iflay, 


that [ f du is defined. Then 


as desired. 


EXERCISES 3A 


1 Suppose (X,S, j/) is a measure space and f: X — [0,00] is an S-measurable 
function such that { f dj: < oo. Explain why 


inf f =0 
in f 
for each set E € S with p(E) = 0. 
2 Suppose X isa set, S is a g-algebra on X, andc € X. Define the Dirac measure 
dc on (X,S) by 
1 ifceeE 
6c(E) = , 
ar {6 ifc ¢ E. 


Prove that if f: X — [0,00] is S-measurable, then [ f dd, = f(c). 
[Careful: {c} may not be in S.] 


3 Suppose (X,S, 1) is a measure space and f: X —> [0,00] is an S-measurable 
function. Prove that 


[fen > 0 if and only if w({x € X: f(x) > 0}) > 0. 


10 


11 
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Give an example of a Borel measurable function f : [0,1] — (0,00) such that 
L(f,[0,1]) =0. 

[Recall that L(f, [0,1]) denotes the lower Riemann integral, which was defined 
in Section 1A. If A is Lebesgue measure on |0,1], then the previous exercise 
States that f f dA > 0 for this function f, which is what we expect of a positive 
function. Thus even though both L(f, [0,1]) and f f dd are defined by taking 
the supremum of approximations from below, Lebesgue measure captures the 
right behavior for this function f and the lower Riemann integral does not.] 


Verify the assertion that integration with respect to counting measure is summa- 
tion (Example 3.6). 


Suppose (X,S, j) is a measure space, f : X — [0,00] is S-measurable, and P 
and P’ are S-partitions of X such that each set in P’ is contained in some set in 
P. Prove that £(f, P) < £(f, P’). 


Suppose X is a set, S is the g-algebra of all subsets of X, and w: X — [0,00] 
is a function. Define a measure p/ on (X,S) by 


u(E) = )o w(x) 


xeE 


for E C X. Prove that if f: X — [0,0] is a function, then 
[feu = X wl) fln, 
where the infinite sums above are defined as the supremum of all sums over 


finite subsets of X. 


Suppose A denotes Lebesgue measure on R. Given an example of a sequence 
fi, fo,... of simple Borel measurable functions from R to [0,00) such that 
limg5co f(x) = 0 for every x € R but limg_,o. ff, dA = 1. 


Suppose j/ is a measure on a measurable space (X,S) and f: X — [0, co] is an 
S-measurable function. Define v: S — [0,00] by 


vA) = | xf 
for A € S. Prove that v is a measure on (X,S). 


Suppose (X,S, 11) is a measure space and f1, f2,... is a sequence of nonnegative 
S-measurable functions. Define f: X — [0,00] by f(x) = VR fe (x). Prove 


that . 
[fau= Xe | feaw 


Suppose (X, S, j) is a measure space and f1, fz, ... are S-measurable functions 
from X to R such that 72, || fx| du < co. Prove that there exists E € S such 
that w(X \ E) = 0 and limy_... f(x) = 0 for every x € E. 


12 


13 


14 
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Show that there exists a Borel measurable function f: R — (0,00) such that 
f Xt f dX = oo for every nonempty open interval I C R, where A denotes 
Lebesgue measure on R. 


Give an example to show that the Monotone Convergence Theorem (3.11) can 
fail if the hypothesis that f,, fo,... are nonnegative functions is dropped. 


Give an example to show that the Monotone Convergence Theorem can fail if 
the hypothesis of an increasing sequence of functions is replaced by a hypothesis 
of a decreasing sequence of functions. 

[This exercise shows that the Monotone Convergence Theorem should be called 
the Increasing Convergence Theorem. However, see Exercise 20.] 


Suppose A is Lebesgue measure on R and f: R — [—09, co] is a Borel measur- 
able function such that [ f dA is defined. 


(a) Fort € R, define f;: R — [—co0,00] by f(x) = f(x —t). Prove that 
[frdr = f fad forallt eR 

(b) Fort € R \ {0}, define f;: R — [—co, 00] by f(x) = f (tx). Prove that 
J fida = ty J fad forall t > 0. 


Suppose S and 7 are g-algebras on a set X and S C 7. Suppose p1; is a 
measure on (X,S), ji2 is a measure on (X,7), and p1(E) = p2(E) for all 
E € S. Prove that if f: X —> [0,00] is S-measurable, then [ f du, = f f dp. 


For x1,X2,... a sequence in [—0o, co], define lim inf x; by 
—0o 


lim inf x, = lim inf{x,,x,41,-.-}- 
k-00 k—-o0 


Note that inf{x,.,x,41,--.} is an increasing function of k; thus the limit above 
on the right exists in [—0o, oo]. 


17 


Suppose that (X,S, j1) is a measure space and fy, f2,... is a sequence of non- 
negative S-measurable functions on X. Define a function f: X — [0,00] by 


f(x) = liminé fx(2). 


(a) Show that f is an S-measurable function. 
(b) Prove that 


fan < limint | fy dp. 
k>0o 


(c) Give an example showing that the inequality in (b) can be a strict inequality 
even when ji(X) < oo and the family of functions { f, },<z+ is uniformly 
bounded. 


[The result in (b) is called Fatou’s Lemma. Some textbooks prove Fatou’s Lemma 
and then use it to prove the Monotone Convergence Theorem. Here we are taking 
the reverse approach—you should be able to use the Monotone Convergence 
Theorem to give a clean proof of Fatou’s Lemma. | 
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Give an example of a sequence x1, xX2,... of real numbers such that 


n 
Jim, yy x, exists in R, 
k=1 
but [ x dy is not defined, where ju is counting measure on Z* and x is the 
function from Z* to R defined by x(k) = xx. 


Show that if (X,S, /) is a measure space and f: X —> [0, co) is S-measurable, 
then 


p(X) inf f s [fan < pA Neu y. 


Suppose (X,S, j/) is a measure space and 1, f,... is a monotone (meaning 
either increasing or decreasing) sequence of S-measurable functions. Define 


f: X > [—09, 00] by 
f(x) = him fel). 


Prove that if ['|f,| dj: < 00, then 


tim f fica = | fap. 
k- 00 


Henri Lebesgue wrote the following about his method of integration: 


Ihave to pay a certain sum, which I have collected in my pocket. I 
take the bills and coins out of my pocket and give them to the creditor 
in the order I find them until I have reached the total sum. This is the 
Riemann integral. But I can proceed differently. After I have taken 
all the money out of my pocket I order the bills and coins according 
to identical values and then I pay the several heaps one after the other 
to the creditor. This is my integral. 


Use 3.15 to explain what Lebesgue meant and to explain why integration of 
a function with respect to a measure can be thought of as partitioning the 
range of the function, in contrast to Riemann integration, which depends upon 
partitioning the domain of the function. 

[The quote above is taken from page 796 of The Princeton Companion to 
Mathematics, edited by Timothy Gowers.] 
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3B Limits of Integrals & Integrals of Limits 


This section focuses on interchanging limits and integrals. Those tools allow us to 
characterize the Riemann integrable functions in terms of Lebesgue measure. We 
also develop some good approximation tools that will be useful in later chapters. 


Bounded Convergence Theorem 


We begin this section by introducing some useful notation. 


3.24 Definition integration on a subset 


Suppose (X,S, 1) is a measure space and E € S. If f: X — [—co,00] is an 
S-measurable function, then af = f du is defined by 


[faa fxof an 


if the right side of the equation above is defined; otherwise ue E f du is undefined. 


Alternatively, you can think of f; rf dp as J fle dye, where pg is the measure 
obtained by restricting jz to the elements of S that are contained in E. 

Notice that according to the definition above, the notation [- x f dy means the same 
as | f du. The following easy result illustrates the use of this new notation. 


3.25 bounding an integral 


Suppose (X,S,}) is a measure space, E € S, and f: X — [—cv,o0] isa 
function such that f, rf dp is defined. Then 


| [fel < HCE) supl fl 


Proof Letc = sup|f|. We have 


| Fan = | | ref ay 


< f xelflan 


[exe du 
= cu (E), 


where the second line comes from 3.23, the third line comes from 3.8, and the fourth 
line comes from 3.15. 


IA 
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The next result could be proved as a special case of the Dominated Convergence 
Theorem (3.31), which we prove later in this section. Thus you could skip the proof 
here. However, sometimes you get more insight by seeing an easier proof of an 
important special case. Thus you may want to read the easy proof of the Bounded 
Convergence Theorem that is presented next. 


3.26 Bounded Convergence Theorem 


Suppose (X,S, j/) is a measure space with p(X) < oo. Suppose fy, fo,... isa 
sequence of S-measurable functions from X to R that converges pointwise on X 
to a function f: X — R. If there exists c € (0,00) such that 


Ife(x)| <¢ 


for allk € Z* and all x € X, then 


jim f fan = | f ap. 


Proof The function f is S-measurable 
by 2.48. 

Suppose c satisfies the hypothesis of 
this theorem. Let ¢ > 0. By Egorov’s 
Theorem (2.85), there exists E € S such 
that w(X\ E) < ¢ and fy, fo,... con- 
verges uniformly to f on E. Now 


Note the key role of Egorov’s 
Theorem, which states that pointwise 


convergence is close to uniform 
convergence, in proofs involving 
interchanging limits and integrals. 


|| fea — ff an =| ete fn Soe fief) du 


< fio plele tf glflae + fife flay 
< 5 +M(E) suplfi— fl 
E 


where the last inequality follows from 3.25. Because fy, f2,... converges uniformly 
to f on E and p(E) < ©, the right side of the inequality above is less than e for k 
sufficiently large, which completes the proof. 


Sets of Measure 0 in Integration Theorems 
Suppose (X,S, 1) is a measure space. If f,g: X — [—oco,0o] are S-measurable 


functions and 
u({xe X: f(x) Ag(x)}) =0, 


then the definition of an integral implies that [ f du = [ g du (or both integrals are 
undefined). Because what happens on a set of measure 0 often does not matter, the 
following definition is useful. 
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3.27 Definition almost every 


Suppose (X,S, 4) is a measure space. A set E € S is said to contain pt-almost 


every element of X if (X \ E) = 0. If the measure p is clear from the context, 
then the phrase almost every can be used (abbreviated by some authors to a. e.). 


For example, almost every real number is irrational (with respect to the usual 
Lebesgue measure on R) because |Q| = 0. 

Theorems about integrals can almost always be relaxed so that the hypotheses 
apply only almost everywhere instead of everywhere. For example, consider the 
Bounded Convergence Theorem (3.26), one of whose hypotheses is that 


lim f(x) = f(x) 


for all x € X. Suppose that the hypotheses of the Bounded Convergence Theorem 
hold except that the equation above holds only almost everywhere, meaning there 
is a set E € S such that p(X \ E) = 0 and the equation above holds for all x € E. 
Define new functions 91, 29,... and g by 


_ Sfx) ifxe E, 2 SF @Y ate €&, 
ate) = {f ifxe X\E a w= 4h ifxe X\E. 


Then 
lim gx(x) = g(x) 
k-400 
for all x € X. Hence the Bounded Convergence Theorem implies that 
lim [scau= | gay, 
k-00 
which immediately implies that 
lim | fea = [fan 
k- 00 
because f gx du = f fe duand f gdu = f f dp. 


Dominated Convergence Theorem 


The next result tells us that if a nonnegative function has a finite integral, then its 
integral over all small sets (in the sense of measure) is small. 


3.28 integrals on small sets are small 


Suppose (X,S,}1) is a measure space, g: X — [0,00] is S-measurable, and 
if g du < ov. Then for every ¢ > 0, there exists 6 > 0 such that 


du< 
lie 


for every set B € S such that p(B) < 6. 
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Proof Suppose ¢ > 0. Leth: X —> [0,c0) be a simple S-measurable function such 


thatO <h < g and 
[sa [rae < ss 


the existence of a function / with these properties follows from 3.9. Let 
H = max{h(x):x © X} 


and let 6 > 0 be such that Hé < 5. 
Suppose B € S and p(B) < 6. Then 


Gu) ea [ina 
[sa [is )dut+ tae 
< | (gh) du + Hy(B) 
€ 
<é 
as desired. 


Some theorems, such as Egorov’s Theorem (2.85) have as a hypothesis that the 
measure of the entire space is finite. The next result sometimes allows us to get 
around this hypothesis by restricting attention to a key set of finite measure. 


3.29 integrable functions live mostly on sets of finite measure 


Suppose (X,S,}1) is a measure space, g: X — [0,00] is S-measurable, and 
J gdp < ov. Then for every ¢ > 0, there exists E € S such that p(E) < oo and 


du <e. 
yee 


Proof Suppose ¢ > 0. Let P be an S-partition Aj,..., Aj of X such that 


3.30 [su <e+L(g,P). 
Let E be the union of those A; such that inf f > 0. Then p(E) < © (because 
J 


otherwise we would have £(g,P) = oo, which contradicts the hypothesis that 
J gdp < co). Now 


[8 = [sa | xan 
< (es Lg, P Le?) 
= €&, 


where the second line follows from 3.30 and the definition of the integral of a 
nonnegative function, and the last line holds because inf f = 0 for each Aj not 
contained in E. sj 
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Suppose (X, S, j1) is a measure space and f1, f2,... is a sequence of S-measurable 
functions on X such that limy_5o. fx(x) = f(x) for every (or almost every) x € X. 
In general, it is not true that limp... ff, due = J f du (see Exercises | and 2). 

We already have two good theorems about interchanging limits and integrals. 
However, both of these theorems have restrictive hypotheses. Specifically, the Mono- 
tone Convergence Theorem (3.11) requires all the functions to be nonnegative and 
it requires the sequence of functions to be increasing. The Bounded Convergence 
Theorem (3.26) requires the measure of the whole space to be finite and it requires 
the sequence of functions to be uniformly bounded by a constant. 

The next theorem is the grand result in this area. It does not require the sequence 
of functions to be nonnegative, it does not require the sequence of functions to 
be increasing, it does not require the measure of the whole space to be finite, and 
it does not require the sequence of functions to be uniformly bounded. All these 
hypotheses are replaced only by a requirement that the sequence of functions is 
pointwise bounded by a function with a finite integral. 

Notice that the Bounded Convergence Theorem follows immediately from the 
result below (take g to be an appropriate constant function and use the hypothesis in 
the Bounded Convergence Theorem that p(X) < oo). 


3.31 Dominated Convergence Theorem 


Suppose (X,S, j1) is a measure space, f: X — [—09, 00] is S-measurable, and 
fi, fo,-.. are S-measurable functions from X to [—0e, co] such that 


lim f(x) = f(x) 


k-00 


for almost every x € X. If there exists an S-measurable function g: X — [0,00] 
such that 


[sau <co and |filx)] < g(x) 


for every k € Z* and almost every x € X, then 


jim f fay = ff ap. 


Proof Suppose g: X —> [0,00] satisfies the hypotheses of this theorem. If E € S, 
then 


|| fede — f fan] =| yee fi foe f few — f fan 
< | yee +| fo fou] +] fe feae— fi fan 


3.32 <2] d | du— f du. 
S? fvp8 p+ |p fa fi | 
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Case 1: Suppose p(X) < oo. 
Let € > 0. By 3.28, there exists 6 > 0 such that 


€ 
3.33 [sa < Z 


for every set B € S such that (B) < 6. By Egorov’s Theorem (2.85), there exists 
aset E € S such that p(X \ E) < 6 and fy, fo,... converges uniformly to f on E. 
Now 3.32 and 3.33 imply that 


|| peau — f fan <5+|[Ui- Aan, 


Because fy, f2,... converges uniformly to f on E and (E) < 0, the last term on 
the right is less than § for all sufficiently large k. Thus limy,. f fe du = f f du, 
completing the proof of case 1. 

Case 2: Suppose p(X) = oo. 

Let e > 0. By 3.29, there exists E € S such that y(E) < co and 


€ 
du<-. 
eo 


The inequality above and 3.32 imply that 


| [ fean— f fanl< 5+] f feaw— f fan. 


By case 1 as applied to the sequence f{|z, f2|g,..., the last term on the right is less 
than 5 for all sufficiently large k. Thus limy_,.. ff, du = ff du, completing the 
proof of case 2. 


Riemann Integrals and Lebesgue Integrals 


We can now use the tools we have developed to characterize the Riemann integrable 
functions. In the theorem below, the left side of the last equation denotes the Riemann 
integral. 


3.34 Riemann integrable <=> continuous almost everywhere 


Suppose a < b and f: [a,b] — R is a bounded function. Then f is Riemann 
integrable if and only if 


|{x € |a,b] : f is not continuous at x}| = 0. 


Furthermore, if ff is Riemann integrable and A denotes Lebesgue measure on R, 
then f is Lebesgue measurable and 


b 
i f=} htt 
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Proof Suppose n € Z*. Consider the partition P, that divides [a,b] into 2” sub- 
intervals of equal size. Let I,,..., Inn be the corresponding closed subintervals, each 
of length (b — a)/2”. Let 


Qn Qn 
3.35 gn = Vi(inff)x, and In = Y° (sup f)x;. 

j=l jj J j=l iF j 

The lower and upper Riemann sums of f for the partition P, are given by integrals. 
Specifically, 


3.36 Lf, Pn, (a,b]) =], Seda and Uf, Pr, [tb] =}, yin a 


where A is Lebesgue measure on R. 

The definitions of g, and hy, given in 3.35 are actually just a first draft of the 
definitions. A slight problem arises at each point that is in two of the intervals 
I,,...,Ion (in other words, at endpoints of these intervals other than a and b). At 
each of these points, change the value of gy to be the infimum of f over the union 
of the two intervals that contain the point, and change the value of h, to be the 
supremum of f over the union of the two intervals that contain the point. This change 
modifies g, and hy on only a finite number of points. Thus the integrals in 3.36 are 
not affected. This change is needed in order to make 3.38 true (otherwise the two 
sets in 3.38 might differ by at most countably many points, which would not really 
change the proof but which would not be as aesthetically pleasing). 

Clearly g; < g2 <--- is an increasing sequence of functions and, > hz > --- 
is a decreasing sequence of functions on [a,b]. Define functions f': [a,b] + R and 
f¥: [a,b] + Rby 


fe(x) = lim gn(x) and fY (x) = tim fin(2). 
Taking the limit as 1 — oo of both equations in 3.36 and using the Bounded Conver- 


gence Theorem (3.26) along with Exercise 7 in Section 1A, we see that f* and fU 
are Lebesgue measurable functions and 


3.37 —L(f, [a,b]) = i fedA and U(f, [a,b]) = | fod. 
[a, b| [a, b| 
Now 3.37 implies that f is Riemann integrable if and only if 
| (fl — fh) dA =0. 
[a,b] 
Because f'(x) < f(x) < fY(x) for all x € [a,b], the equation above holds if and 


only if 
{x € [a,b] : f(x) A fr(x)}] =0. 


The remaining details of the proof can be completed by noting that 


3.38 {x © [a,b]: fU(x) A fl (x)} = {x © [a,b] : f is not continuous at x}. 
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We previously defined the notation fis f to mean the Riemann integral of f. 
Because the Riemann integral and Lebesgue integral agree for Riemann integrable 


functions (see 3.34), we now redefine f . f to denote the Lebesgue integral. 


Definition 


Suppose —oo < a < b < ooand f: (a,b) > R is Lebesgue measurable. Then 


® fe f and te f (x) dx mean f, (a,b) f 4A, where A is Lebesgue measure on R; 


e |; f is defined to be — (ee 


The definition in the second bullet point above is made so that equations such as 


[rafrr fs 


remain valid even if, for example, a <b <c. 


Approximation by Nice Functions 


In the next definition, the notation || f ||; should be || f||1,,, because it depends upon 
the measure ji as well as upon f. However, j/ is usually clear from the context. In 
some books, you may see the notation £!(X, S, 1) instead of £1 (1). 


Suppose (X,S, 1) is a measure space. If f: X — [—co,0o] is S-measurable, 
then the L!-norm of f is denoted by || f ||, and is defined by 


Il = [flay 


The Lebesgue space L! (1) is defined by 


L\(u) = {f : f is an S-measurable function from X to R and || f ||; < co}. 


The terminology and notation used above are convenient even though ||-||, might 
not be a genuine norm (to be defined in Chapter 6). 


3.41 Example CL! (4) functions that take on only finitely many values 


Suppose (X,S, j/) is a measure space and Ej,...,E, are disjoint subsets of X. 
Suppose 41,...,@, are distinct nonzero real numbers. Then 


MX, +++ +anx, € Vig) 
if and only if Ex € S and p(E;) < co forall k € {1,...,n}. Furthermore, 
aXe, +++ + aux, ll = |ar|e(E1) +--+ + |an|e (En). 
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3.42 Example ¢! 


If 2 is counting measure on Z* and x = x1, X,... is a sequence of real numbers 
(thought of as a function on Z*), then ||x||1 = D2, |xx|. In this case, £1() is 
often denoted by ¢! (pronounced little-el-one). In other words, ¢' is the set of all 
sequences X1,X2,... of real numbers such that )(7°_,|x;| < oo. 


The easy proof of the following result is left to the reader. 


3.43 properties of the L'-norm 


Suppose (X,S, j1) is a measure space and f,g € £L!(p). Then 


e |lfll: 2 0; 
e || f ||, = 0 if and only if f(x) = 0 for almost every x € X; 
© IIcfll1 = lelllflla for alle € R; 


e [lf < Wfll + Isl. 


The next result states that every function in L(y) can be approximated in L1- 
norm by measurable functions that take on only finitely many values. 


3.44 approximation by simple functions 


Suppose j/ is a measure and f € L!(j). Then for every ¢ > 0, there exists a 


simple function g € £!(y1) such that 


lh Ses 


Proof Suppose e > 0. Then there exist simple functions 91,92 € L'(p) such that 
0 <9, < ft and0 < go < f~ and 


[Ut-sdan<5 and [Ur -aae <5, 


where we have used 3.9 to provide the existence of 91, 2 with these properties. 
Let g = 91 — go. Then g is a simple function in £!(y) and 


lf — lla = 1G * -— 91) — GF — 22) lt 
Ge giant f (FR 82) dye 
<€ 


r) 


as desired. 
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e The notation £'(R) denotes L(A), where A is Lebesgue measure on either 
the Borel subsets of R or the Lebesgue measurable subsets of R. 


e When working with £!(R), the notation || f||, denotes the integral of the 
absolute value of f with respect to Lebesgue measure on R. 


{ 3.46 Definition 
A step function is a function g: R — R of the form 
SSG a as 


where [,,...,I, are intervals of R and a,,...,@, are nonzero real numbers. 


Suppose g is a step function of the form above and the intervals I1,...,In are 
disjoint. Then 
lIgll1 = laa] [a] + +--+ fan! nl. 


In particular, g € Li (R) if and only if all the intervals I,,..., I, are bounded. 

The intervals in the definition of a step 
function can be open intervals, closed in- 
tervals, or half-open intervals. We will be 
using step functions in integrals, where 
the inclusion or exclusion of the endpoints 
of the intervals does not matter. 


Even though the coefficients 
a1,...,Qy in the definition of a step 
function are required to be nonzero, 
the function 0 that is identically 0 on 
R is a step function. To see this, take 
n=1,a, = 1, andl, =@. 


3.47 approximation by step functions 


Suppose f € L1(R). Then for every ¢ > 0, there exists a step function 


g € £!(R) such that 


Wa eli © 


Proof Suppose ¢ > 0. By 3.44, there exist Borel (or Lebesgue) measurable subsets 
Aj,...,An of R and nonzero numbers 41,...,4, such that |A;| < co for all k € 
{1,...,n} and 

€ 

5 


n 

lr = de tha, i. 

For each k € {1, auuaip n}, there is an open subset G;, of R that contains A; and 
whose Lebesgue measure is as close as we want to |A;| [by part (e) of 2.71]. Each 
open subset of R, including each Gy, is a countable union of disjoint open intervals. 
Thus for each k, there is a set E;, that is a finite union of bounded open intervals 
contained in G; whose Lebesgue measure is as close as we want to |G,;|. Hence for 
each k, there is a set E; that is a finite union of bounded intervals such that 
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|Ex \ Axl + |Ak \ Ex] < [Gx \ Axl + |Gx \ Ex! 
€ 
2|a,|n’ 


in other words, 
€ 
IX, — Xz, la < 2a;|n 
Now 


n n n n 
[F— Deverell, <F— Dereral + [Deer Leaerall 


€ n 
< 2 + Dilan llxa, = Me lle 
k=1 


<€. 


Each E;, is a finite union of bounded intervals. Thus the inequality above completes 
n : : 
the proof because )7;_, AX p, 18 a step function. 


Luzin’s Theorem (2.91 and 2.93) gives a spectacular way to approximate a Borel 
measurable function by a continuous function. However, the following approximation 
theorem is usually more useful than Luzin’s Theorem. For example, the next result 
plays a major role in the proof of the Lebesgue Differentiation Theorem (4.10). 


3.48 approximation by continuous functions 


Suppose f € £1(R). Then for every € > 0, there exists a continuous function 


g: R — R such that 
If —glh <e 
and {x € R: g(x) #0} is a bounded set. 


Proof For every @4,.<. ty, Dip».. Dar C1j;.0.,Cn © Rand g1)...,8, € £1(R), 


we have 
nN 


nN n 
If = yy ase, = If - 2 TEX belly | ye A(X, 4] 2) 
k=1 k=1 k=1 
n n 
<|[f— Yo aarp, call, + Lolan! lla py og — ell 
k=1 k=1 

where the inequalities above follow 
from 3.43. By 3.47, we can choose 
Ay,...,An,b1,..., 0, C1,.+-,Cn E Rto 
make ||f — 7, AKX toy, cll as small 
as we wish. The figure here then 
shows that there exist continuous func- 0 ; * 
tions 91,...,28n € £1(R) that make by Ck 
Year 14x IX toy, — &x||1 as small as we The graph of a continuous function g,, 
wish. Now take g = )°7_, 48k. such that IX toe, cd — gxll1 is small. 
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EXERCISES 3B 


1 Give an example of a sequence f;, fo,... of functions from Z* to (0, co) such 


that 
lim f,.(m) =0 


k->0o 


for every m € Zt but ie / fx dt = 1, where p is counting measure on Zt. 
—0o 


2 Give an example of a sequence fj, f2,... of continuous functions from R to 
[0, 1] such that 
lim fx(x) = 0 


k-0o 


for every x € R but jim [re dA = oo, where A is Lebesgue measure on R. 
—00 


3 Suppose A is Lebesgue measure on R and f: R — R is a Borel measurable 
function such that ['|f| dA < oo. Define g: R + R by 


g(x) = hes af 


Prove that g is uniformly continuous on R. 


4 (a) Suppose (X,S,p) is a measure space with p(X) < oo. Suppose that 
ff: X — [0,00) is a bounded S-measurable function. Prove that 


m 
[feu = inf{ u(Aj) sup f : Ay,..-,Am is an S-partition of x}. 
aa 4 


(b) Show that the conclusion of part (a) can fail if the hypothesis that f is 
bounded is replaced by the hypothesis that f f dp < oo. 


(c) Show that the conclusion of part (a) can fail if the condition that p(X) < co 
is deleted. 


[Part (a) of this exercise shows that if we had defined an upper Lebesgue sum, 
then we could have used it to define the integral. However, parts (b) and (c) show 
that the hypotheses that f is bounded and that 1(X) < co would be needed if 
defining the integral via the equation above. The definition of the integral via the 
lower Lebesgue sum does not require these hypotheses, showing the advantage 
of using the approach via the lower Lebesgue sum. ] 


5 Let A denote Lebesgue measure on R. Suppose f: R — R isa Borel measurable 
function such that ['|f| dA < co. Prove that 


lim | pfa= / faa. 


kc 


6 Let A denote Lebesgue measure on R. Give an example of a continuous function 
f: (0,00) > R such that limy—+00 Joo t f dA exists (in R) but Jo 23) f dA is not 
defined. 
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Let A denote Lebesgue measure on R. Give an example of a continuous function 
fi: (0,1) — R such that limy—+o0 Ja 1) f dA exists (in R) but Jo 1) f Ais not 
defined. . 

Verify the assertion in 3.38. 

Verify the assertion in Example 3.41. 


(a) Suppose (X,S,}1) is a measure space such that p(X) < oo. Suppose 
p,? are positive numbers with p < r. Prove that if f: X — [0,00) is an 
S-measurable function such that [ f" du < oo, then [ fP du < oo. 


(b) Give an example to show that the result in part (a) can be false without the 
hypothesis that 1(X) < oo, 


Suppose (X,S, j1) is a measure space and f € £!(y). Prove that 
{x eX: f(x) £0} 
is the countable union of sets with finite -measure. 


Suppose 


a) 
Prove that lim | f, =0. 
ko JO 


Give an example of a sequence of nonnegative Borel measurable functions 
fi, fz,.-. on [0,1] such that both the following conditions hold: 


1 
li | = 
P rue 0 fr 


e sup fx (x) = co for every m € Z* and every x € [0,1]. 
k>m 


Let A denote Lebesgue measure on R. 

(a) Let f(x) = 1/,/x. Prove that Sio,1 faA= 2, 

(b) Let f(x) =1/(14-x?). Prove that [, f dA = 7. 

(c) Let f(x) = (sinx)/x. Show that the integral J(0,00) f dA is not defined 
but limy+c0 Soo t) f dA exists in R. 


Prove or give a counterexample: If G is an open subset of (0,1), then XG is 
Riemann integrable on (0, 1]. 
Suppose f € £1(R). 
(a) Fort € R, define fy: R — R by f(x) = f(x —f). Prove that 
li — = 0) 
lim|| f — fell 
(b) Fort > 0, define f;: R — R by f(x) = f (tx). Prove that 
lim||f — fill, = 0. 
t>1 


® 
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Does there exist a Lebesgue measurable set that fills up exactly half of each interval? 
To get a feeling for this question, consider the set E = [0, 3] U [4,3] U [3,3] U[?, 4- 
This set E has the property that 


b 
JEN [0,b]| = 5 
for b = 0, i 7 3, 1. Does there exist a Lebesgue measurable set E C [0,1], perhaps 
constructed in a fashion similar to the Cantor set, such that the equation above holds 
for all b € [0,1]? 

In this chapter we see how to answer this question by considering differentia- 
tion issues. We begin by developing a powerful tool called the Hardy—Littlewood 
maximal inequality. This tool is used to prove an almost everywhere version of the 
Fundamental Theorem of Calculus. These results lead us to an important theorem 
about the density of Lebesgue measurable sets. 


Trinity College at the University of Cambridge in England. G. H. Hardy 
(1877-1947) and John Littlewood (1885-1977) were students and later faculty 
members here. If you have not already done so, you should read Hardy’s remarkable 
book A Mathematician’s Apology (do not skip the fascinating Foreword by C. P. 
Snow) and see the movie The Man Who Knew Infinity, which focuses on Hardy, 
Littlewood, and Srinivasa Ramanujan (1887-1920). 
CC-BY-SA Rafa Esteve 


© Sheldon Axler 2020 
S. Axler, Measure, Integration & Real Analysis, Graduate Texts 101 
in Mathematics 282, https://doi.org/10.1007/978-3-030-33143-6_4 
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4A Hardy—Littlewood Maximal Function 


Markov’s Inequality 


The following result, called Markov’s inequality, has a sweet, short proof. We will 
make good use of this result later in this chapter (see the proof of 4.10). Markov’s 
inequality also leads to Chebyshev’s inequality (see Exercise 2 in this section). 


4.1 Markov’s inequality 


Suppose (X,S, 1) is a measure space and h € L'(p). Then 


w(x € X: |h(x)| > ef) < SIelh 


for every c > 0. 


Proof Suppose c > 0. Then 


1 
X:|h 2 —- s 
u({x € |n(x)| = c}) fa ee one 


a 


<8 
~ C ree 


|h| du 


IA 


1 
= lil, 


as desired. 


St. Petersburg University along the Neva River in St. Petersburg, Russia. 
Andrei Markov (1856-1922) was a student and then a faculty member here. 
CC-BY-SA A. Savin 
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Vitali Covering Lemma 


4.2 Definition 3 times a bounded nonempty open interval 


Suppose I is a bounded nonempty open interval of R. Then 3 x I denotes the 
open interval with the same center as I and three times the length of I. 


4.3 Example 3 times an interval 
If I = (0,10), then 3 * I = (—10,20). 


The next result is a key tool in the proof of the Hardy—Littlewood maximal 
inequality (4.8). 


4.4 Vitali Covering Lemma 


Suppose I;,...,I, is a list of bounded nonempty open intervals of R. Then there 
exists a disjoint sublist I,,...,I;,, such that 


hU---Ul, Cc B*h,)U---UG*k,). 


4.5 Example Vitali Covering Lemma 
Suppose n = 4 and 
I, = (0,10), Ib = (9,15), I3 = (14,22), Ig = (21,31). 
Then 
3x = (—10,20), 3*h= (3,21), 3*Iz = (6,30), 3% I, = (11,41). 


Thus 

UbhUBUYC (3% 1))U(3* iy). 
In this example, I1, I4 is the only sublist of [1, In, Iz, I, that produces the conclusion 
of the Vitali Covering Lemma. 


Proof of 4.4 Let ky be such that 
(Zk, | = max{|K|,..., [In|}. 


Suppose ky,...,k; have been chosen. 
Let kj; be such that i. is as large 
as possible subject to the condition that 


Tepeasy Tray are disjoint. If there is no 


The technique used here is called a 
greedy algorithm because at each 


Stage we select the largest remaining 
interval that is disjoint from the 
choice of kj, such that I;,,..., kiz1 HE \ previously selected intervals. 
disjoint, then the procedure terminates. 
Because we start with a finite list, the procedure must eventually terminate after some 
number m of choices. 
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Suppose j € {1,...,2}. To complete the proof, we must show that 
Tj Cc (3*k,)U---UGB*k,,). 


If j € {ky,...,km}, then the inclusion above obviously holds. 

Thus assume that j ¢ {k,,...,km}. Because the process terminated without 
selecting j, the interval J; is not disjoint from all of [;,,..., Ik,,- Let I;, be the first 
interval on this list not disjoint from J;; thus J; is disjoint from I;,,..., k,_,. Because 
j was not chosen in step L, we conclude that |Ik,| > |J;|. Because Ik, NI; # @, this 
last inequality implies (easy exercise) that I; C 3 * Ik, , completing the proof. 


Hardy—Littlewood Maximal Inequality 


Now we come to a brilliant definition that turns out to be extraordinarily useful. 


4.6 Definition Hardy-Littlewood maximal function 


Suppose h: R — R is a Lebesgue measurable function. Then the Hardy— 
Littlewood maximal function of h is the function h*: R — [0,00] defined by 


In other words, h*(b) is the supremum over all bounded intervals centered at b of 
the average of |h| on those intervals. 


4.7 Example Hardy-Littlewood maximal function of Xo, 1 


As usual, let Xi denote the characteristic function of the interval [0,1]. Then 


0,1] 
1 
ity fb <0, ee Se 
* b) = : : ; F ; : 
(Xio,1)) (b) 1 if0<b<1, oo = : : 
i ie 24, 


The graph of (X1o1))" on [—2,3]. 
as you should verify. 


If h: R — Ris Lebesgue measurable and c € R, then {b € R: h*(b) > c} is 
an open subset of R, as you are asked to prove in Exercise 9 in this section. Thus h* 
is a Borel measurable function. 

Suppose h € L1(R) and c > 0. Markov’s inequality (4.1) estimates the size of 
the set on which |h| is larger than c. Our next result estimates the size of the set on 
which h* is larger than c. The Hardy—Littlewood maximal inequality proved in the 
next result is a key ingredient in the proof of the Lebesgue Differentiation Theorem 
(4.10). Note that this next result is considerably deeper than Markov’s inequality. 
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4.8 Hardy-Littlewood maximal inequality 


Suppose h € £!(R). Then 


: 3 
[{b ER: h*(b) > ch] < = Mal 


for every c > 0. 


Proof Suppose F is a closed bounded subset of {b € R: h*(b) > c}. We will 
show that |F| < 3 [™.|h|, which implies our desired result [see Exercise 24(a) in 


Section 2D]. 
For each b € F, there exists t, > 0 such that 
1 b+ty, 
4.9 —s | |n| > c. 
2th b—t, 
Clearly 


Fc J (b — tb + ty). 
beF 


The Heine—Borel Theorem (2.12) tells us that this open cover of a closed bounded set 
has a finite subcover. In other words, there exist by,...,b, © F such that 


Fc (by _ fp 01 + tp,) U---U (bn — ty, bn +ty,). 


To make the notation cleaner, relabel the open intervals above as Iy,..., In. 
Now apply the Vitali Covering Lemma (4.4) to the list I1,...,In, producing a 
disjoint sublist [,,,..., I,,, such that 


U-:-UIy C (3%) U-+U (3 * k,): 
Thus 
|F| <|hU---Uln| 
< |(3*h,)U---UB*k,,)| 
< |3% | +--+ [3 * Ley | 
= 3([Ie,| +--+ + Len!) 


3 
< ZU tlt fel 
Ty Than 
<?f In 
C J—ow 


where the second-to-last inequality above comes from 4.9 (note that [J;| = 2ty for 


> 


the choice of b corresponding to Tk,) and the last inequality holds because Ij,,..., [k,, 
are disjoint. 
The last inequality completes the proof. 
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EXERCISES 4A 


Suppose (X,S,}1) is a measure space and h: X — R is an S-measurable 
function. Prove that 


1 
w({x € X:|h(x)| > e}) <> fil? ay 
for all positive numbers c and p. 


Suppose (X,S, 7) is a measure space with p(X) = 1 andh € L!(). Prove 
that 


u({x ex: |e) [ay| > c}) < 3 (JP an— (fan) ) 


for all c > 0. 

[The result above is called Chebyshev’s inequality; it plays an important role 
in probability theory. Pafnuty Chebyshev (1821-1894) was Markov’s thesis 
advisor. | 


Suppose (X,S,}/) is a measure space. Suppose h € L(y) and ||h||; > 0. 
Prove that there is at most one number c € (0,00) such that 
1 
e({x € X: |h(x)| 2 ch) = = [lalla 
Show that the constant 3 in the Vitali Covering Lemma (4.4) cannot be replaced 
by a smaller positive constant. 


Prove the assertion left as an exercise in the last sentence of the proof of the 
Vitali Covering Lemma (4.4). 


Verify the formula in Example 4.7 for the Hardy—Littlewood maximal function 
of Xo, 1) 


Find a formula for the Hardy—Littlewood maximal function of the characteristic 
function of [0,1] U [2,3]. 


Find a formula for the Hardy—Littlewood maximal function of the function 
h: R — [0,00) defined by 


i <x< 
m= {3 ifO<x<1, 


0 otherwise. 


Suppose : R — R is Lebesgue measurable. Prove that 
{bE R:h*(b) >c} 


is an open subset of R for every c € R. 


10 


11 


12 
13 
14 
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Prove or give a counterexample: If h: R — [0,00) is an increasing function, 
then h* is an increasing function. 


Give an example of a Borel measurable function h: R — 0,00) such that 
h*(b) < © forall b € R but sup{h*(b) : b EC R} = ow. 


Show that |{b € R : h*(b) = co}| = 0 for every h € L1(R). 
Show that there exists h € £!(R) such that h*(b) = co for every b € Q. 


Suppose h € £!(R). Prove that 
‘“ 3 
[{b  R:h*(b) > e}] < SI 


for every c > 0. 

[This result slightly strengthens the Hardy—Littlewood maximal inequality (4.8) 
because the set on the left side above includes those b € R such that h*(b) = c. 
A much deeper strengthening comes from replacing the constant 3 in the Hardy- 
Littlewood maximal inequality with a smaller constant. In 2003, Antonios 
Melas answered what had been an open question about the best constant. He 
proved that the smallest constant that can replace 3 in the Hardy—Littlewood 
maximal inequality is (11 + V61) /12 & 1.56752; see Annals of Mathematics 
157 (2003), 647-688. ] 
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4B Derivatives of Integrals 


Lebesgue Differentiation Theorem 


The next result states that the average amount by which a function in £!(R) differs 
from its values is small almost everywhere on small intervals. The 2 in the denomi- 
nator of the fraction in the result below could be deleted, but its presence makes the 
length of the interval of integration nicely match the denominator 2t. 

The next result is called the Lebesgue Differentiation Theorem, even though no 
derivative is in sight. However, we will soon see how another version of this result 
deals with derivatives. The hard work takes place in the proof of this first version. 


410 Lebesgue Differentiation Theorem, first version 


Suppose f € £1(R). Then 


il 


b+t 
f= F@)| =0 


im — 
HO 2t Jp— 


for almost every b € R. 


Before getting to the formal proof of this first version of the Lebesgue Differen- 
tiation Theorem, we pause to provide some motivation for the proof. If b € R and 
t > 0, then 3.25 gives the easy estimate 


1 ott 
7 Sg F FO)! S sept lf) — FOL: la — bl < th. 
If f is continuous at b, then the right side of this inequality has limit 0 as t | 0, 
proving 4.10 in the special case in which f is continuous on R. 

To prove the Lebesgue Differentiation Theorem, we will approximate an arbitrary 
function in £! (R) by a continuous function (using 3.48). The previous paragraph 
shows that the continuous function has the desired behavior. We will use the Hardy— 
Littlewood maximal inequality (4.8) to show that the approximation produces ap- 
proximately the desired behavior. Now we are ready for the formal details of the 
proof. 


Proof of 4.10 Let d > 0. By 3.48, for each k € Z* there exists a continuous 
function hy: R — R such that 


6 
4.11 —h —- 
II ella < Kak 


Let 
Be = {bE R: | f(b) —hy(b)| < G and (f —hy)*(b) < EH 
Then 


412 R\Be={bER: |f(b) —hy(b)| > {}U{b ER: (f —hy)*(b) > Eh. 
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Markov’s inequality (4.1) as applied to the function f — h, and 4.11 imply that 
6 
4.13 {be R: |f(b) — nb) > BI < = 


The Hardy—Littlewood maximal inequality (4.8) as applied to the function f — hy 
and 4.11 imply that 


36 
4.14 {bE R: (f —e)*(b) > BI < 
Now 4.12, 4.13, and 4.14 imply that 
|R \ Bal < 5-2 
Let _ 
B= () B. 
k=1 
Then 
415 IR\B|=|U(R\8,)| s VIR\B < Ye oa = 
k=1 KK — 


Suppose b € B and t > 0. Then for each k € Z* we have 
1 ott 1 pb+t 
ri = <5 
af, PONS ae ff, (Uf tal + Vc — eC) + IB) — F(0)1) 
. 1 ott 
(F = u)"(b) + (5 fl Me — e(B)1) + iox() — FO)! 
b-t 
D 1 bot 
<i4f. hy — hy(b)|. 
<tta lf [te — hy (0)| 
Because hx is continuous, the last term is less than k for all ¢ > O sufficiently close to 


0 (how close is sufficiently close depends upon k). In other words, for each k € Zt, 
we have 


IA 


1 b+t 


=f, f-FOl< 


for all f > O sufficiently close to i 
Hence we conclude that 


1 
lim 
Tre ae “f- f(b)| = 
for all b € B. 
Let A denote the set of numbers a € R such that 
tim > | If -F@| 


either does not exist or is nonzero. We have shown that A C (R \ B). Thus 
|A| < |R\ B| < 46, 


where the last inequality comes from 4.15. Because 6 is an arbitrary positive number, 
the last inequality implies that |A| = 0, completing the proof. 
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Derivatives 


You should remember the following definition from your calculus course. 


4.16 Definition derivative 


Suppose g: I — R is a function defined on an open interval I of R and b € I. 
The derivative of g at b, denoted g'(b), is defined by 


OV 


t-0 


b+t)-—g(b) 
if 


if the limit above exists, in which case g is called differentiable at b. 


We now turn to the Fundamental Theorem of Calculus and a powerful extension 
that avoids continuity. These results show that differentiation and integration can be 
thought of as inverse operations. 

You saw the next result in your calculus class, except now the function f is 
only required to be Lebesgue measurable (and its absolute value must have a finite 
Lebesgue integral). Of course, we also need to require f to be continuous at the 
crucial point b in the next result, because changing the value of f at a single number 
would not change the function g. 


4.17 Fundamental Theorem of Calculus 


Suppose f € £1(R). Define g: R > R by 


ele 


Suppose b € R and f is continuous at b. Then g is differentiable at b and 


Proof Ift #0, then 


b+t ¢ pb 
oo f(0)| = eae Ee (| 
= |b 2 _ a) 
148 = |= FO) 


< sup _—[f (x) — f(b) |. 
{xeR: |x—b|<|E]} 


If ¢ > 0, then by the continuity of f at b, the last quantity is less than ¢ for t 
sufficiently close to 0. Thus g is differentiable at b and g/(b) = f(b). 
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A function in £!(R) need not be continuous anywhere. Thus the Fundamental 
Theorem of Calculus (4.17) might provide no information about differentiating the 
integral of such a function. However, our next result states that all is well almost 
everywhere, even in the absence of any continuity of the function being integrated. 


4.19 Lebesgue Differentiation Theorem, second version 


Suppose f € £1(R). Define g: R > R by 


s= fof 


Then g/(b) = f(b) for almost every b € R. 


Proof Suppose t # 0. Then from 4.18 we have 
g(b+t)— sb) j(0)| = |e =F) 
t t 


1 pbtt 

<5 fo lf-FO 
1 ott 

<7 fF FO 


for all b € R. By the first version of the Lebesgue Differentiation Theorem (4.10), 
the last quantity has limit 0 as t + 0 for almost every b € R. Thus g/(b) = f(b) for 
almost every b € R. 


Now we can answer the question raised on the opening page of this chapter. 


4.20 no set constitutes exactly half of each interval 


There does not exist a Lebesgue measurable set E C [0,1] such that 


b 
(EO; o | — 5 


for all b € [0,1]. 


Proof Suppose there does exist a Lebesgue measurable set E C [0,1] with the 
property above. Define g: R — R by 


g(b) = [xm 


Thus ¢(b) = § for all b € [0,1]. Hence g/(b) = 4 forall b € (0,1). 

The Lebesgue Differentiation Theorem (4.19) implies that g’(b) = x,(b) for 
almost every b € R. However, X, never takes on the value i, which contradicts the 
conclusion of the previous paragraph. This contradiction completes the proof. 
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The next result says that a function in £! (R) is equal almost everywhere to the 
limit of its average over small intervals. These two-sided results generalize more 
naturally to higher dimensions (take the average over balls centered at b) than the 
one-sided results. 


Paley (R) function equals its local average almost everywhere 


Suppose f € £1(R). Then 


b i 1 b+t 
OT os i= 


for almost every b € R. 


Proof Suppose t > 0. Then 


aL) -FOl= lef, F-70) 


The desired result now follows from the first version of the Lebesgue Differentiation 
Theorem (4.10). 


Again, the conclusion of the result above holds at every number b at which f is 
continuous. The remarkable part of the result above is that even if f is discontinuous 
everywhere, the conclusion holds for almost every real number b. 


Density 


The next definition captures the notion of the proportion of a set in small intervals 
centered at a number b. 


Suppose E C R. The density of E at a number b € R is 


lim |IEN (b—t,b+t)| 
t10 2 


if this limit exists (otherwise the density of E at b is undefined). 


4.23 Example density of an interval 
if b € (0,1), 
ifb =Oorb=1, 


otherwise. 


The density of [0,1] atb = 


ON Re 
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The next beautiful result shows the power of the techniques developed in this 
chapter. 


4.24 Lebesgue Density Theorem 


Suppose E C R is a Lebesgue measurable set. Then the density of E is 1 at 
almost every element of E and is 0 at almost every element of R \ E. 


Proof First suppose |E| < oo. Thus x, € £1(R). Because 
JEN (b—t,b+t)| 1 port 


2 2 Ins ME 

for every t > 0 and every b € R, the desired result follows immediately from 4.21. 

Now consider the case where |E| = co [which means that x. ¢ £1(R) and hence 
4.21 as stated cannot be used]. For k € Z*, let Ex, = EM (—k,k). If |b] < k, then the 
density of E at b equals the density of E; at b. By the previous paragraph as applied 
to E;, there are sets Fy C Ey and G; C R \ Ex such that |F,| = |G,| = 0 and the 
density of Ex equals 1 at every element of E, \ Fy and the density of E, equals 0 at 
every element of (R \ E;) \ Gr. 

Let F = Uy Fp and G = Uf, Gy. Then |F| = |G| = 0 and the density of E is 
1 at every element of E \ F and is 0 at every element of (R \ E) \ G. 


The bad Borel set provided by the next 
result leads to a bad Borel measurable 
function. Specifically, let E be the bad 
Borel set in 4.25. Then x, is a Borel 
measurable function that is discontinuous 
everywhere. Furthermore, the function x, 
cannot be modified on a set of measure 0 
to be continuous anywhere (in contrast to 
the function 7). 

Even though the function x, discussed in the paragraph above is continuous 
nowhere and every modification of this function on a set of measure 0 is also continu- 
ous nowhere, the function g defined by 


(6) = [x 


is differentiable almost everywhere (by 4.19). 
The proof of 4.25 given below is based on an idea of Walter Rudin. 


The Lebesgue Density Theorem 
makes the example provided by the 
next result somewhat surprising. Be 
sure to spend some time pondering 
why the next result does not 
contradict the Lebesgue Density 
Theorem. Also, compare the next 
result to 4.20. 


4.25 bad Borel set 


There exists a Borel set E C R such that 


0<|ENI| <|I| 


for every nonempty bounded open interval I. 
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Proof We use the following fact in our construction: 


4.26 Suppose G is a nonempty open subset of R. Then there exists a closed set 
F Cc G\ Q such that |F| > 0. 


To prove 4.26, let J be a closed interval contained in G such that 0 < | J]. Let 
’1,12,... bea list of all the rational numbers. Let 


‘ Are 
F= INU ake2/ Tk cea 


Then F is a closed subset of R and F C J\Q C G\Q. Also, 


because J\\F CUR, (r — seit + Hb). Thus 


J\F| S 3lJ| 


|Fl = |J|-|F\ Fl > 51d] > 0, 


completing the proof of 4.26. 

To construct the set E with the desired properties, let I), In,... be a sequence 
consisting of all nonempty bounded open intervals of R with rational endpoints. Let 
Fy = Fo = @, and inductively construct sequences F,, Fy,... and A, Bb, ... of closed 


subsets of R as follows: Suppose n € Z+ and Fo,...,F,—1 and Fo,...,F,_1 have 
been chosen as closed sets that contain no rational numbers. Thus 


In\ (FoU...UB,_1) 


is a nonempty open set (nonempty because it contains all rational numbers in I,). 
Applying 4.26 to the open set above, we see that there is a closed set F,, contained in 
the set above such that F,, contains no rational numbers and |F,| > 0. Applying 4.26 
again, but this time to the open set 


Lau. Ry 


which is nonempty because it contains all rational numbers in I,, we see that there is 
a closed set E, contained in the set above such that E, contains no rational numbers 
and |Fi,| >0. 

Now let 


E=VU&. 
k=1 


Our construction implies that F. 0 F,, = © for all k,n € Z*+. Thus ENF, = @ for 
alln € Z*. Hence F, C I, \ E forall n € Z*. 
Suppose I is a nonempty bounded open interval. Then I, C I for somen € Zt. 
Thus 
0<|F.| <|ENh| < |EN]|. 


Also, 
JEN I| =[Z]—|I\ El] < [Z| —Un\ El < [| - Fal < [I], 


completing the proof. 
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EXERCISES 4B 


For f € £L1(R) and I an interval of R with 0 < |I| < ©, let f1 denote the 
average of f on I. In other words, ft = W iF f. 


1 


So Se NH 


10 


Suppose f € £1(R). Prove that 
1 b+t 
ner i lf — fio—t,o44| =0 
for almost every b € R. 


Suppose f € £1(R). Prove that 
i! oe bos 
a sup{ 7 / |f — fi| : Lis an interval of length t containing b} =0 
t I 


for almost every b € R. 


Suppose f: R — R is a Lebesgue measurable function such that ig aw (R). 
Prove that 


F 1 pbtt nz =0 
Boe jg 72 
for almost every b € R. 


Prove that the Lebesgue Differentiation Theorem (4.19) still holds if the hypoth- 
esis that [| f| < co is weakened to the requirement that ["..|f| < co for all 
xeER. 


Suppose f: R — R is a Lebesgue measurable function. Prove that 


IFC) < f°) 
for almost every b € R. 
Prove that if h € L'(R) and f°. =0 forall s € R, thenh = 0. 
Give an example of a Borel subset of R whose density at 0 is not defined. 
Give an example of a Borel subset of R whose density at 0 is 5 


Prove that if ¢ € {0,1], then there exists a Borel set E C R such that the density 
of E at O ist. 


Suppose E is a Lebesgue measurable subset of R such that the density of E 
equals 1 at every element of E and equals 0 at every element of R \ E. Prove 
thatE =@orE=R. 
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Product Measures 


Lebesgue measure on R generalizes the notion of the length of an interval. In this 
chapter, we see how two-dimensional Lebesgue measure on R? generalizes the notion 
of the area of a rectangle. More generally, we construct new measures that are the 
products of two measures. 

Once these new measures have been constructed, the question arises of how to 
compute integrals with respect to these new measures. Beautiful theorems proved in 
the first decade of the twentieth century allow us to compute integrals with respect to 
product measures as iterated integrals involving the two measures that produced the 
product. Furthermore, we will see that under reasonable conditions we can switch 
the order of an iterated integral. 


Main building of Scuola Normale Superiore di Pisa, the university in Pisa, Italy, 
where Guido Fubini (1879-1943) received his PhD in 1900. In 1907 Fubini proved 
that under reasonable conditions, an integral with respect to a product measure can 
be computed as an iterated integral and that the order of integration can be switched. 
Leonida Tonelli (1885-1943) also taught for many years in Pisa; he also proved a 
crucial theorem about interchanging the order of integration in an iterated integral. 
CC-BY-SA Lucarelli 


© Sheldon Axler 2020 
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5A Products of Measure Spaces 


Products of o-Algebras 


Our first step in constructing product measures is to construct the product of two 
g-algebras. We begin with the following definition. 


Keep the figure shown here in mind 
when thinking of a rectangle in the sense 
defined above. However, remember that 
A and B need not be intervals as shown 
in the figure. Indeed, the concept of an 
interval makes no sense in the generality 
of arbitrary sets. 


jes] 


AxB 


Now we can define the product of two o-algebras. 


Suppose (X,S) and (Y, 7) are measurable spaces. Then 


e the product S ® T is defined to be the smallest c-algebra on X x Y that 
contains 


{AxB:AES,BET}; 


e ameasurable rectangle in S ® 7 is a set of the form A x B, where A € S 
and Be TJ. 


Using the terminology introduced in 
the second bullet point above, we can say 
that S © J is the smallest c-algebra con- 
taining all the measurable rectangles in 
S@T. Exercise | in this section asks 
you to show that the measurable rectan- 
gles in S @ 7 are the only rectangles in 
X x Y that areinS @T. 

The notion of cross sections plays a crucial role in our development of product 
measures. First, we define cross sections of sets, and then we define cross sections of 
functions. 


The notation S x T is not used 
because S and T are sets (of sets), 
and thus the notation S x T 


already is defined to mean the set of 
all ordered pairs of the form (A,B), 
where AG SandBeT. 
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5.3 Definition cross sections of sets; [E], and [E]’ 


Suppose X and Y are sets and E Cc X x Y. Then for a € X and b € Y, the cross 
sections [E]q and [E]? are defined by 


Eee Yay) 2 andi [Ee —4eex her 


5.4 Example cross sections of a subset of X x Y 


Y Y 


5.5 Example cross sections of rectangles 
Suppose X and Y are sets and A C X and B C Y. Ifa € X andb € Y, then 


[Ax Bly = B we a sad [A x B]? = A ree, 
® ifagA © ifbé€B, 


as you should verify. 


The next result shows that cross sections preserve measurability. 


5.6 cross sections of measurable sets are measurable 


Suppose S is a g-algebra on X and J is a c-algebra on Y. If E € S ®7, then 


[Ela € T foreverya€ X and [E]® € S foreveryb cY. 


Proof Let € denote the collection of subsets E of X x Y for which the conclusion 
of this result holds. Then A x B € € forall A € S andall B € T (by Example 5.5). 
The collection € is closed under complementation and countable unions because 


[(X x Y)\ Ela =Y\ [Ela 
and 
[E, UE2U-+- Ja = [Ela U [E2JaU--- 


for all subsets E, E,,E>,... of X x Y and all a € X, as you should verify, with 
similar statements holding for cross sections with respect to all b € Y. 

Because € is a v-algebra containing all the measurable rectangles in S ® 7, we 
conclude that € contains S @ T. 
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Now we define cross sections of functions. 


| 


Suppose X and Y are sets and f: X x Y — Risa function. Then for a € X and 


b € Y, the cross section functions [f]a: Y + Rand [f]’: X — R are defined 
by 


Lfla(y) = f(a,y)fory€Y and [f]’(x) = f(x,b) for x € X. 


5.8 Example cross sections 
e Suppose f: R x R — Ris defined by f(x,y) = 5x? + y?. Then 
[flo(y) =20+y° and [f)(x) = 5x" +27 
for all y € R and all x € R, as you should verify. 
e Suppose X and Y are sets and A C X and B CY. Ifa € X andb € Y, then 
ax pla =XA()xX_ and [x4 , gl” =Xp(b)x,. 


as you should verify. 


The next result shows that cross sections preserve measurability, this time in the 
context of functions rather than sets. 


5.9 cross sections of measurable functions are measurable 


Suppose S is a g-algebra on X and J is a o-algebra on Y. Suppose 
f: Xx Y > Ris an S © 7 -measurable function. Then 


[fla is a J -measurable function on Y for every a € X 


[f]° is an S-measurable function on X for every b € Y. 


Proof Suppose D is a Borel subset of R anda € X. If y € Y, then 
y € ([fla) '(D) => [flaly) € D 
<= f(ay)eD 
<> (a,y) € f-1(D) 
> ye [fF (Dh. 
Thus 
([fJa)"(D) = [F-"(D)a- 
Because f is an S ® T-measurable function, f~'!(D) € S @ T. Thus the equation 


above and 5.6 imply that ([f]2)~!(D) € T. Hence [f]]q is a 7-measurable function. 
The same ideas show that [f]” is an S-measurable function for every b € Y. 
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Monotone Class Theorem 


The following standard two-step technique often works to prove that every set in a 
o-algebra has a certain property: 


1. show that every set in a collection of sets that generates the o-algebra has the 
property; 
2. show that the collection of sets that has the property is a g-algebra. 


For example, the proof of 5.6 used the technique above—first we showed that every 
measurable rectangle in S ® 7 has the desired property, then we showed that the 
collection of sets that has the desired property is a v-algebra (this completed the proof 
because S ® J is the smallest o-algebra containing the measurable rectangles). 

The technique outlined above should be used when possible. However, in some 
situations there seems to be no reasonable way to verify that the collection of sets 
with the desired property is a g-algebra. We will encounter this situation in the next 
subsection. To deal with it, we need to introduce another technique that involves 
what are called monotone classes. 

The following definition will be used in our main theorem about monotone classes. 


5.10 Definition algebra 


Suppose W is a set and A is a set of subsets of W. Then A is called an algebra 
on W if the following three conditions are satisfied: 


e DEA; 


e ifEe¢ A, thenW\E € A; 


e if E and F are elements of A, then EUF € A. 


Thus an algebra is closed under complementation and under finite unions; a 
g-algebra is closed under complementation and countable unions. 


5.11. Example collection of finite unions of intervals is an algebra 


Suppose A is the collection of all finite unions of intervals of R. Here we are in- 
cluding all intervals—open intervals, closed intervals, bounded intervals, unbounded 
intervals, sets consisting of only a single point, and intervals that are neither open nor 
closed because they contain one endpoint but not the other endpoint. 

Clearly A is closed under finite unions. You should also verify that A is closed 
under complementation. Thus A is an algebra on R. 


5.12 Example collection of countable unions of intervals is not an algebra 


Suppose A is the collection of all countable unions of intervals of R. 

Clearly A is closed under finite unions (and also under countable unions). You 
should verify that A is not closed under complementation. Thus A is neither an 
algebra nor a o-algebra on R. 
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The following result provides an example of an algebra that we will exploit. 


5.13 the set of finite unions of measurable rectangles is an algebra 


Suppose (X,S) and (Y, 7 ) are measurable spaces. Then 


(a) the set of finite unions of measurable rectangles in S ® 7 is an algebra 
on X x Y; 


(b) every finite union of measurable rectangles in S @ 7 can be written as a 
finite union of disjoint measurable rectangles in S ® 7. 


Proof Let A denote the set of finite unions of measurable rectangles in S ® T. 
Obviously A is closed under finite unions. 

The collection A is also closed under finite intersections. To verify this claim, 
note that if Ay,...,An,Cy,...,Cm € S and By,...,By,D4,...,Dm € T, then 


((Ai x By) U+++U (An x Bn)) (Ca x Dy) U+++U (Car x Din)) 
Cx iD) 


Cia 
Cs 


((4j x Bj) 9 (Ck x De) 


rar 
ll 
e 
a 
ll 
e 


((AjN Cy) x (Bj Dy), AxB 


I 
‘es 
Ces 


a 
ll 
un 
~ 
ll 
un 


Intersection of two rectangles is a rectangle. 


which implies that A is closed under finite intersections. 
If A € Sand B € 7, then 


(Xx Y)\(AxB) = ((x\ A) x Y) U(x x (Y\B)). 


Hence the complement of each measurable rectangle in S ® 7 is in A. Thus the 
complement of a finite union of measurable rectangles in S ® J is in A (use De 
Morgan’s Laws and the result in the previous paragraph that A is closed under finite 
intersections). In other words, A is closed under complementation, completing the 
proof of (a). 

To prove (b), note that if A x B and C x D are measurable rectangles in S ®@ 7, 
then (as can be verified in the figure above) 


5.14 (Ax B)U(C x D) = (Ax B)U (Cx (D\B)) U((C\ A) x (BD). 


The equation above writes the union of two measurable rectangles in S ® 7 as the 
union of three disjoint measurable rectangles in S © T. 

Now consider any finite union of measurable rectangles in S © 7. If this is not 
a disjoint union, then choose any nondisjoint pair of measurable rectangles in the 
union and replace those two measurable rectangles with the union of three disjoint 
measurable rectangles as in 5.14. Iterate this process until obtaining a disjoint union 
of measurable rectangles. 
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Now we define a monotone class as a collection of sets that is closed under 
countable increasing unions and under countable decreasing intersections. 


5.15 Definition monotone class 


Suppose W isa set and M is a set of subsets of W. Then M is called a monotone 
class on W if the following two conditions are satisfied: 


foe) 
e IfE, C Ey C--- is anincreasing sequence of sets in M, then ie) E, €e M; 
k=1 


[oe) 
e If E,; > Ey D--- is a decreasing sequence of sets in M, then (1) E, € M. 
k=1 


Clearly every o-algebra is a monotone class. However, some monotone classes 
are not closed under even finite unions, as shown by the next example. 


5.16 Example a monotone class that is not an algebra 


Suppose A is the collection of all intervals of R. Then A is closed under countable 
increasing unions and countable decreasing intersections. Thus A is a monotone 
class on R. However, A is not closed under finite unions, and A is not closed under 
complementation. Thus A is neither an algebra nor a o-algebra on R. 


If A is a collection of subsets of some set W, then the intersection of all mono- 
tone classes on W that contain A is a monotone class that contains A. Thus this 
intersection is the smallest monotone class on W that contains A. 

The next result provides a useful tool when the standard technique for showing 
that every set in a g-algebra has a certain property does not work. 


5.17 Monotone Class Theorem 


Suppose A is an algebra on a set W. Then the smallest c-algebra containing A 
is the smallest monotone class containing A. 


Proof Let M denote the smallest monotone class containing A. Because every o- 
algebra is a monotone class, M is contained in the smallest o-algebra containing A. 
To prove the inclusion in the other direction, first suppose A € A. Let 


E={EEM:AVEEM}. 


Then A Cc € (because the union of two sets in A is in A). A moment’s thought 
shows that € is a monotone class. Thus the smallest monotone class that contains A 
is contained in €, meaning that M C €. Hence we have proved that AU E € M 
for every E € M. 
Now let 
D={DEM:DUE€EM forall E € M}. 
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The previous paragraph shows that .A C D. A moment’s thought again shows that D 
is a monotone class. Thus, as in the previous paragraph, we conclude that MC D. 
Hence we have proved that DUE € M for all D,E € M. 

The paragraph above shows that the monotone class M is closed under finite 
unions. Now if £1, Eo,... € M, then 


E, UE,UE3U--- = EyU(E,U Ez) U (Ey UE2UE3)U-:- , 


which is an increasing union of a sequence of sets in M (by the previous paragraph). 
We conclude that M is closed under countable unions. 
Finally, let 
M' ={EeEM:W\EEM}. 


Then A Cc M! (because A is closed under complementation). Once again, you 
should verify that M’ is a monotone class. Thus M Cc M’. We conclude that M is 
closed under complementation. 

The two previous paragraphs show that M is closed under countable unions and 
under complementation. Thus M is a o-algebra that contains A. Hence M contains 
the smallest c-algebra containing A, completing the proof. 


Products of Measures 


The following definitions will be useful. 


5.18 Definition finite measure; c-finite measure 


e A measure j/ on a measurable space (X,S) is called finite if u(X) < ov. 


e A measure is called -finite if the whole space can be written as the countable 
union of sets with finite measure. 


e More precisely, a measure p on a measurable space (X,S) is called o-finite 
if there exists a sequence X1, X2,... of sets in S such that 


Cc 
X= |JX, and p(X) < 00 for every k € Z*. 
k=" 


5.19 Example _ finite and o-finite measures 


e Lebesgue measure on the interval [0,1] is a finite measure. 
e Lebesgue measure on R is not a finite measure but is a o-finite measure. 


e Counting measure on R is not a g-finite measure (because the countable union 
of finite sets is a countable set). 
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The next result will allow us to define the product of two o-finite measures. 


5.20 measure of cross section is a measurable function 


Suppose (X,S, 4) and (Y,7,v) are o-finite measure spaces. If EE S@T, 
then 


(a) x ++ v([E],) is an S-measurable function on X; 


(b) y+ p([E]¥) is a 7 -measurable function on Y. 


Proof We will prove (a). If E € S @ 7, then [E], € 7 for every x € X (by 5.6); 
thus the function x +> v([E],) is well defined on X. 
We first consider the case where v is a finite measure. Let 


M ={E€S@T:x+> v([E],) is an S-measurable function on X}. 


We need to prove that M = S@T. 

If A € S and B € 7, then v([A x B]y) = v(B)x ,(x) for every x € X (by 
Example 5.5). Thus the function x ++ v([A x B],) equals the function v(B)x , (as 
a function on X), which is an S-measurable function on X. Hence M contains all 
the measurable rectangles in S @ T. 

Let A denote the set of finite unions of measurable rectangles in S ® 7. Suppose 
E € A. Then by 5.13(b), E is a union of disjoint measurable rectangles E1,...,Ey. 
Thus 


v({E]x) v({E1 U- ++ UEn]x) 
v({E1]x Ure U [En|x) 


v([EyJx) +++: +v([En]x), 


where the last equality holds because v is a measure and [E,]x,..., [En]x are disjoint. 
The equation above, when combined with the conclusion of the previous paragraph, 
shows that x ++ v({E],) is a finite sum of S-measurable functions and thus is an 
S-measurable function. Hence E € M. We have now shown that A Cc M. 

Our next goal is to show that M is a monotone class on X x Y. To do this, first 


suppose E,; C Ez C --- is an increasing sequence of sets in M. Then 
v((U Exlx) =v(U (Eds) 
k=1 k=1 
= lim v((Exlx). 
k- co 


where we have used 2.59. Because the pointwise limit of S-measurable functions 
is S-measurable (by 2.48), the equation above shows that x +> v((UZa Exlz) is 
an S-measurable function. Hence Uj, Ex, € M. We have now shown that M is 
closed under countable increasing unions. 
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Now suppose FE; > Ep > --- is a decreasing sequence of sets in M. Then 


where we have used 2.60 (this is where we use the assumption that v is a finite 
measure). Because the pointwise limit of S-measurable functions is S-measurable 
(by 2.48), the equation above shows that x +> v([Neea Ex]x) is an S-measurable 
function. Hence (\j°., E, € M. We have now shown that M is closed under 
countable decreasing intersections. 

We have shown that V/ is a monotone class that contains the algebra A of all 
finite unions of measurable rectangles in S @ J [by 5.13(a), A is indeed an algebra]. 
The Monotone Class Theorem (5.17) implies that M contains the smallest v-algebra 
containing A. In other words, M contains S ® 7. This conclusion completes the 
proof of (a) in the case where v is a finite measure. 

Now consider the case where v is a 7-finite measure. Thus there exists a sequence 
Yi, Yo,... of sets in T such that Ue, Yk = Y and v(Y¥;) < 00 for each k € Z*. 
Replacing each Y, by Y; U---U Yg, we can assume that Y; C Yo C ::-. If 
E€S@T, then 

v([E]x) = jim v([EN (X x Y¢)]x). 
The function x ++ v({[EM (X x Y;)]x) is an S-measurable function on X, as follows 
by considering the finite measure obtained by restricting v to the o-algebra on Y;, 
consisting of sets in 7 that are contained in Y;. The equation above now implies that 
x ++ v([E]x) is an S-measurable function on X, completing the proof of (a). 
The proof of (b) is similar. 


5.21 Definition integration notation 


Suppose (X,S, 1) is a measure space and g: X — [—o0, 00] is a function. The 
notation 


[se du(x) means [su 


where dy(x) indicates that variables other than x should be treated as constants. 


5.22 Example integrals 


If A is Lebesgue measure on [0,4], then 
64 
2 2 2 
My) =4x°+8 and dA(x) = = + 4y, 
JugQ@tWAW =42 +8 and fh OP ty) Arle) = +4y 


The intent in the next definition is that Jy Jy f(x,y) dv(y) du(x) is defined only 
when the inner integral and then the outer integral both make sense. 
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5.23 Definition iterated integrals 


Suppose (X,S,) and (Y,7,v) are measure spaces and f: X x Y > Risa 
function. Then 


[, [fev ety) du(x) means ([ fley) av(y)) dp(x). 


In other words, to compute J f\, f(x,y) dv(y) d(x), first (temporarily) fix x € 
X and compute ie f(x,y) dv(y) [if this integral makes sense]. Then compute the 
integral with respect to p of the function x ++ f), f(x,y) dv(y) [if this integral 
makes sense]. 


5.24 Example iterated integrals 
If A is Lebesgue measure on [0,4], then 


Tea hog +y) dA(y) dA(x) = ; i (4x2 + 8) dA(x) 


, 


and 


i [get dA(x) dA(y) =[, (5 +4y) dy) 


_ 352 
= 


The two iterated integrals in this example turned out to both equal 352 even thou gh 
they do not look alike in the intermediate step of the evaluation. As we will see in the 
next section, this equality of integrals when changing the order of integration is not a 
coincidence. 


The definition of (wu x v)(E) given below makes sense because the inner integral 
below equals v( (Ely), which makes sense by 5.6 (or use 5.9), and then the outer 
integral makes sense by 5.20(a). 

The restriction in the definition below to o-finite measures is not bothersome be- 
cause the main results we seek are not valid without this hypothesis (see Example 5.30 
in the next section). 


5.25 Definition product of two measures; py x v 


Suppose (X,S,) and (Y,7,v) are o-finite measure spaces. ForE € S@T, 


define (u x v)(E) by 
(wx v(E) = ff xe(x-y) duly) dp (a). 
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5.26 Example measure of a rectangle 


Suppose (X,S,p) and (Y,7,v) are o-finite measure spaces. If A € S and 
B € 7, then 


(wx v)(AxB)= ff xq, ply) avy) u(x) 


= | v(B)x q(x) u(x) 
= (A)v(B). 


Thus product measure of a measurable rectangle is the product of the measures of the 
corresponding sets. 


For (X,S,p) and (Y, 7, v) o-finite measure spaces, we defined the product p x v 
to be a function from S & T to [0, 00] (see 5.25). Now we show that this function is 
a measure. 


5.27 product of two measures is a measure 


Suppose (X,S,p/) and (Y,7,v) are g-finite measure spaces. Then p x v is a 
measure on (X x Y,S @T). 


Proof Clearly (u x v)(@) =0. 
Suppose Ej, E2,... is a disjoint sequence of sets in S @ 7. Then 


where the fourth equality follows from the Monotone Convergence Theorem (3.11; 
or see Exercise 10 in Section 3A). The equation above shows that p/ x v satisfies the 
countable additivity condition required for a measure. 
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EXERCISES 5A 


10 


Suppose (X,S) and (Y,7) are measurable spaces. Prove that if A is a 
nonempty subset of X and B is a nonempty subset of Y such that A x B € 
S®T,thenA€ SandBeT. 


Suppose (X,S) is a measurable space. Prove that if E € S @ S, then 
FER ae L ES, 


Let 6 denote the c-algebra of Borel subsets of R. Show that there exists a set 
E C Rx R such that [E], € B and [E]* € B for everya € R, but E € B@B. 


Suppose (X,S) and (Y, 7) are measurable spaces. Prove that if f: X — R is 
S-measurable and g: Y — R is 7-measurable and h: X x Y — R is defined 
by h(x,y) = f(x)g(y), then h is (S © T )-measurable. 


Verify the assertion in Example 5.11 that the collection of finite unions of 
intervals of R is closed under complementation. 


Verify the assertion in Example 5.12 that the collection of countable unions of 
intervals of R is not closed under complementation. 


Suppose A is a nonempty collection of subsets of a set W. Show that A is an 
algebra on W if and only if A is closed under finite intersections and under 
complementation. 


Suppose p/ is a measure on a measurable space (X,S). Prove that the following 
are equivalent: 
(a) The measure p is o-finite. 


(b) There exists an increasing sequence X; C Xz C --- of sets in S such that 
X = Ue, Xp and p(X;,) < 00 for every k € ZT. 


(c) There exists a disjoint sequence X1, X2, X3,... of sets in S such that 
X = UR, Xp and p(X) < 00 for every k € Z*. 


Suppose yp and v are o-finite measures. Prove that j¢ x v is a g-finite measure. 


Suppose (X,S,j) and (Y,7,v) are o-finite measure spaces. Prove that if w is 
a measure on S ® 7 such that w(A x B) = u(A)v(B) forall A € S and all 
BeT,thenw =x v. 

[The exercise above means that A Xx v is the unique measure on S & T that 
behaves as we expect on measurable rectangles. | 
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5B Iterated Integrals 


Tonelli’s Theorem 


Relook at Example 5.24 in the previous section and notice that the value of the 
iterated integral was unchanged when we switched the order of integration, even 
though switching the order of integration led to different intermediate results. Our 
next result states that the order of integration can be switched if the function being 
integrated is nonnegative and the measures are c-finite. 


5.28 Tonelli’s Theorem 


Suppose (X,S,y) and (Y,7,v) are o-finite measure spaces. Suppose 
f: X x Y — [0,09] is S ® T-measurable. Then 


(a) ae i f(x,y) dv(y) is an S-measurable function on X, 
vg 


yr i f(x,y) du(x) is a T-measurable function on Y, 
x 


fd(uxv)= [ [fenay ) du(x y= ff penance ) dv(y). 


ay 


Proof We begin by considering the special case where f = X, for some E € SOT. 
In this case, 


[xe dv(y) = v([E]x) for every x € X 
and 
[xc du(x) = p([E]¥) for every y € Y. 


Thus (a) and (b) hold in this case by 5.20. 
First assume that py and v are finite measures. Let 


M = {EeSaT: ff xp(x.y) dv( ) du(y) du(x =f f xe) du( x) du(y)}. 


If A€ Sand B € 7, then A x B € M because both sides of the equation defining 
M equal (A)v(B). 

Let A denote the set of finite unions of measurable rectangles in S ® 7. Then 
5.13(b) implies that every element of A is a disjoint union of measurable rectangles 
in S @ 7. The previous paragraph now implies A Cc M. 

The Monotone Convergence Theorem (3.11) implies that M is closed under 
countable increasing unions. The Bounded Convergence Theorem (3.26) implies 
that M is closed under countable decreasing intersections (this is where we use the 
assumption that y and v are finite measures). 

We have shown that V/ is a monotone class that contains the algebra A of all 
finite unions of measurable rectangles in S @ 7 [by 5.13(a), A is indeed an algebra]. 
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The Monotone Class Theorem (5.17) implies that M contains the smallest v-algebra 
containing A. In other words, M contains S ® 7. Thus 


529 ff xe(xy) duly) autx) = ff acelx,y) d(x) avy) 


foreveryE EC S@T. 

Now relax the assumption that py and v are finite measures. Write X as an 
increasing union of sets X; C Xz C --- in S with finite measure, and write Y 
as an increasing union of sets Y; C Y2 C --- in 7 with finite measure. Suppose 
E € S&T. Applying the finite-measure case to the situation where the measures 
and the g-algebras are restricted to X; and Y;, we can conclude that 5.29 holds 
with E replaced by EM (Xj; x Y¢) for all j,k € Z*. Fix k € Z* and use the 
Monotone Convergence Theorem (3.11) to conclude that 5.29 holds with E replaced 
by EM (X x Y,) for all k € Z*. One more use of the Monotone Convergence 
Theorem then shows that 


[oy xede xr) = ff xeeryavty) dnl) = ff xely) au(x) avly) 


for all E € S @7, where the first equality above comes from the definition of 
(uw x v)(E) (see 5.25). 

Now we turn from characteristic functions to the general case of an S ® T- 
measurable function f: X x Y — [0,00]. Define a sequence fy, f2,... of simple 
S ® T-measurable functions from X x Y to [0, 00) by 


ie if f(x,y) < k and m is the integer with f(x,y) € 


f(xy) = ¢ 2 
kif f(x,y) >k. 


mm+i1 
2k’ 2k ). 


Note that 
0< filzy) < foxy) < fa(x,y) <--- and jim f(x,y) = f(x,y) 


for all (x,y) € X x Y. 

Each f, is a finite sum of functions of the form cx,, wherec € RandE € S@T. 
Thus the conclusions of this theorem hold for each function f;. 

The Monotone Convergence Theorem implies that 


[ fey) avty) = jim [ felesy) doy) 


for every x € X. Thus the function x ++ fy f(x,y) dv(y) is the pointwise limit on 
X of a sequence of S-measurable functions. Hence (a) holds, as does (b) for similar 
reasons. 

The last line in the statement of this theorem holds for each f;. The Monotone 
Convergence Theorem now implies that the last line in the statement of this theorem 
holds for f, completing the proof. 
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See Exercise | in this section for an example (with finite measures) showing that 
Tonelli’s Theorem can fail without the hypothesis that the function being integrated 
is nonnegative. The next example shows that the hypothesis of o-finite measures also 
cannot be eliminated. 


5.30 Example Tonelli’s Theorem can fail without the hypothesis of o-finite 


Suppose BG is the o-algebra of Borel subsets of [0,1], A is Lebesgue measure on 
({0,1], 8), and p is counting measure on ({0,1], 8). Let D denote the diagonal of 
[0,1] x [0,1]; in other words, 


D = {(x,x):x € [0,1]}. 


Then 
vy) du(y) dA(x) = 1h =1, 
is Gs y) du(y) d(x) i 
but 


,y) dA(x) d = Ody = 0. 
Joy fg ODAC) Au) = [OH 


L 


The following useful corollary of Tonelli’s Theorem states that we can switch the 
order of summation in a double-sum of nonnegative numbers. Exercise 2 asks you 
to find a double-sum of real numbers in which switching the order of summation 
changes the value of the double sum. 


31 double sums of nonnegative numbers 


If {xj x : j,k € Z*} is a doubly indexed collection of nonnegative numbers, then 


Proof Apply Tonelli’s Theorem (5.28) to x p, where p is counting measure 
on Zt. 


Fubini’s Theorem 


Our next goal is Fubini’s Theorem, which 
has the same conclusions as Tonelli’s 
Theorem but has a different hypothesis. 
Tonelli’s Theorem requires the function 
being integrated to be nonnegative. Fu- 
bini’s Theorem instead requires the inte- 
gral of the absolute value of the function 
to be finite. When using Fubini’s The- 
orem to evaluate the integral of f, you 
will usually first use Tonelli’s Theorem as 
applied to | f| to verify the hypothesis of 
Fubini’s Theorem. 


Historically, Fubini’s Theorem 
(proved in 1907) came before 
Tonelli’s Theorem (proved in 1909). 
However, presenting Tonelli’s 
Theorem first, as is done here, seems 


to lead to simpler proofs and better 
understanding. The hard work here 
went into proving Tonelli’s Theorem; 
thus our proof of Fubini’s Theorem 
consists mainly of bookkeeping 
details. 
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As you will see in the proof of Fubini’s Theorem, the function in 5.32(a) is defined 
only for almost every x € X and the function in 5.32(b) is defined only for almost 
every y € Y. For convenience, you can think of these functions as equaling 0 on the 
sets of measure 0 on which they are otherwise undefined. 


5.32 Fubini’s Theorem 
Suppose (X,S,y) and (Y,7,v) are o-finite measure spaces. Suppose 


f: Xx Y — [-co,0] is S @ T-measurable and fy, y|f|d(u x v) < 0. 
Then 


[few dv(y) < co for almost every x € X 


[flew du(x) < co for almost every y € Y. 


Furthermore, 


(a) XK | f(x,y) dv(y) is an S-measurable function on X, 
We 


yo i f(x,y) du(x) is a T-measurable function on Y, 
x 


TO avi [ [feway ) du(x y= ff fey aux) x) dv(y). 


AY 


Proof Tonelli’s Theorem (5.28) applied to the nonnegative function | f| implies that 
x fy|f(x,y)| dv(y) is an S-measurable function on X. Hence 


{rex: [fe nlavy) soles. 


Tonelli’s Theorem applied to | f| also tells us that 


[ [fe wh avy) dulx) <0 


because the iterated integral above equals [\..)|f| d( x v). The inequality above 


implies that 
u({x EX: [\fenlavy) = oo} ) _0. 


Recall that f+ and f~ are nonnegative S ® T-measurable functions such that 
If] = ft + fo and f = ft — f~ (see 3.17). Applying Tonelli’s Theorem to f* 
and f~, we see that 


593 xt ff f(xy)avly) and xv [ f(x,y) aly) 
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are S-measurable functions from X to [0,00]. Because ft < uy and f~ < |f|, the 
sets {x € X: fy ft (x,y) dv(y) = oo} and {x € X: f, f(x,y) du(y) = oo} 
have p- th fl 0. Thus the intersection of these two sets, which is the set of x € X 
such that f\, f(x,y) dv(y) is not defined, also has -measure 0. 

Subtracting the second function in 5.33 from the first function in 5.33, we see that 
the function that we define to be 0 for those x € X where we encounter oo — 00 (a 
set of j1-measure 0, as noted above) and that equals [\, f(x,y) dv(y) elsewhere is an 
S-measurable function on X. 

Now 


fdnxv= fo prdexv)- fe faux) 


=f [fenay ) du(x a ta (x,y) dv(y) du(x) 


=f [ (Ft ey) — Fey) avy) dna) 


= [ [few dv(y) du(x), 


where the first line above comes from the definition of the integral of a function that 
is not nonnegative (note that neither of the two terms on the right side of the first line 
equals oo because fy. y|f| d(x v) < 00) and the second line comes from applying 
Tonelli’s Theorem to f* and f~. 

We have now proved all aspects of Fubini’s Theorem that involve integrating first 
over Y. The same procedure provides proofs for the aspects of Fubini’s theorem that 
involve integrating first over X. 


Area Under Graph 


5.34 Definition region under the graph 


Suppose X is a set and f: X — [0,00] is a function. Then the region under the 


graph of f, denoted Ur, is defined by 


Up= (at) eX % (0c) 0 t= f(x). 


The figure indicates why we call Ur 
the region under the graph of f, even 
in cases when X is not a subset of R. 
Similarly, the informal term area in the 
next paragraph should remind you of the 
Uf area in the figure, even though we are 
really dealing with the measure of Ur in 
a product space. 
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The first equality in the result below can be thought of as recovering Riemann’s 
conception of the integral as the area under the graph (although now in a much more 
general context with arbitrary o-finite measures). The second equality in the result 
below can be thought of as reinforcing Lebesgue’s conception of computing the area 
under a curve by integrating in the direction perpendicular to Riemann’s. 


5.35 area under the graph of a function equals the integral 


Suppose (X,S,,/) is a o-finite measure space and f: X —> [0,00] is an 
S-measurable function. Let B denote the v-algebra of Borel subsets of (0,00), 
and let A denote Lebesgue measure on ((0,00), 8B). Then Uy € S @ Band 


(Hx MUP =f fau= fo wlxe Xst< fe) ALO) 


Proof Fork € Z*, let 


K-1 


Ee = U (f-1(02- 2) x © #)) and Fe = f-"([k,09]) x (0,). 


m=0 


Then E; is a finite union of S © B-measurable rectangles and F, is an S ® B- 
measurable rectangle. Because 


we conclude that Ur € S @B. 
Now the definition of the product measure y x A implies that 


(ux A)(Up) = = fe hoea* (2/8) dA(E) dy(x) 


- ee du(x) 


which completes the proof of the first equality in the conclusion of this theorem. 
Tonelli’s Theorem (5.28) tells us that we can interchange the order of integration 
in the double integral above, getting 


(Wx AMU A = fo fe xuj(at aula) at 
= fac u({x © X:t < f(x)}) aa(s), 


which completes the proof of the second equality in the conclusion of this theorem. 


Markov’s inequality (4.1) implies that if f and py are as in the result above, then 


u({x eX: f(x) >t) < < Sith B 


for all t > 0. Thus if x f du < ©, then the result above should be considered to be 
somewhat stronger than Markov’s inequality (because ne (0,0) i dA(t) = 00). 
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EXERCISES 5B 


1 (a) Let A denote Lebesgue measure on [0,1]. Show that 


xy 7 
dA(y) dA =— 
hota (x2 + y?)2 (y) A(x) 4 


and 
eS ye 7 
dA(x) dA =—-_, 
london GapprOM aw =—] 
(b) Explain why (a) violates neither Tonelli’s Theorem nor Fubini’s Theorem. 


2 (a) Give an example of a doubly indexed collection { Xonn :m,n € vag of 
real numbers such that 


foe) CO 
Xmn =O and ie e Xmn = ©. 


1 n=1 m=1 


12 
t18 


m=1n 


(b) Explain why (a) violates neither Tonelli’s Theorem nor Fubini’s Theorem. 


3 Suppose (X,S) is a measurable space and f: X —> [0,0] is a function. Let B 
denote the -algebra of Borel subsets of (0,00). Prove that Uy € S @ B if and 
only if f is an S-measurable function. 


4 Suppose (X,S) is a measurable space and f: X — R is a function. Let 
graph(f) C X x R denote the graph of f: 


graph(f) = {(x, f(x)) +x € X}. 


Let B denote the o-algebra of Borel subsets of R. Prove that graph(f) € S@B 
if and only if f is an S-measurable function. 
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5C Lebesgue Integration on R” 


Throughout this section, assume that m and n are positive integers. Thus, for example, 
5.36 should include the hypothesis that m and n are positive integers, but theorems 
and definitions become easier to state without explicitly repeating this hypothesis. 


Borel Subsets of R” 


We begin with a quick review of notation and key concepts concerning R”. 
Recall that R” is the set of all m-tuples of real numbers: 


R” = {(%1,..-,Xn) 1X1,---,Xn € R}. 
The function ||-||.. from R” to [0, 00) is defined by 
[| (Kays + +p XH) leo = max{ |%4|,...7[%p|}. 
For x € R” and 6 > 0, the open cube B(x,5) with side length 26 is defined by 
B(x,d) = {y ER": |ly— xlloo < 5}. 


If n = 1, then an open cube is simply a bounded open interval. If n = 2, then an 
open cube might more appropriately be called an open square. However, using the 
cube terminology for all dimensions has the advantage of not requiring a different 
word for different dimensions. 

A subset G of R” is called open if for every x € G, there exists 6 > 0 such that 
B(x,65) C G. Equivalently, a subset G of R” is called open if every element of G is 
contained in an open cube that is contained in G. 

The union of every collection (finite or infinite) of open subsets of R” is an open 
subset of R”. Also, the intersection of every finite collection of open subsets of R” is 
an open subset of R”. 

A subset of R” is called closed if its complement in R” is open. A set A C R” is 
called bounded if sup{||a||oo:a€ A} < o. 

We adopt the following common convention: 


R” x R" is identified with R™*". 


To understand the necessity of this convention, note that R? x R 4 R® because 
R? x R and R® contain different kinds of objects. Specifically, an element of R? x R 
is an ordered pair, the first of which is an element of R2 and the second of which is 
an element of R; thus an element of R2 x R looks like ((x1, x2), x3). An element 
of R° is an ordered triple of real numbers that looks like (x1, x2,x3). However, we 
can identify ((x1,x2),x3) with (x1, x2,%3) in the obvious way. Thus we say that 
R? x R “equals” R*. More generally, we make the natural identification of R” x R” 
with R™*", 

To check that you understand the identification discussed above, make sure that 
you see why B(x,6) x B(y,d) = B((x,y),6) forall x € R™, y € R", and d > 0. 

We can now prove that the product of two open sets is an open set. 
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5.36 product of open sets is open 


Suppose Gj is an open subset of R” and G2 is an open subset of R”. Then 
G1 x G» is an open subset of R™*". 


Proof Suppose (x,y) € Gy x Gp. Then there exists an open cube D in R™ centered 
at x and an open cube E in R” centered at y such that D C G, and E C G. By 
reducing the size of either D or E, we can assume that the cubes D and E have the 
same side length. Thus D x E is an open cube in R”*” centered at (x,y) that is 
contained in Gj Xx Gp. 

We have shown that an arbitrary point in G; x Gz is the center of an open cube 
contained in Gj x G2. Hence G; x G2 is an open subset of R™*”. 


When n = 1, the definition below of a Borel subset of R! agrees with our previous 
definition (2.29) of a Borel subset of R. 


5.37 Definition Borel set; B,, 


e A Borel subset of R” is an element of the smallest c-algebra on R” containing 
all open subsets of R”. 


e The o-algebra of Borel subsets of R” is denoted by By. 


Recall that a subset of R is open if and only if it is a countable disjoint union of 
open intervals. Part (a) in the result below provides a similar result in R”, although 
we must give up the disjoint aspect. 


5.38 open sets are countable unions of open cubes 


(a) A subset of R” is open in R” if and only if it is a countable union of open 
cubes in R”. 


(b) By, is the smallest 7-algebra on R” containing all the open cubes in R”. 


Proof We will prove (a), which clearly implies (b). 

The proof that a countable union of open cubes is open is left as an exercise for 
the reader (actually, arbitrary unions of open cubes are open). 

To prove the other direction, suppose G is an open subset of R”. For each x € G, 
there is an open cube centered at x that is contained in G. Thus there is a smaller 
cube Cy such that x € Cy C G and all coordinates of the center of C, are rational 
numbers and the side length of Cy is a rational number. Now 


e= (1c. 
xeEG 


However, there are only countably many distinct cubes whose center has all rational 
coordinates and whose side length is rational. Thus G is the countable union of open 
cubes. 
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The next result tells us that the collection of Borel sets from various dimensions 
fit together nicely. 


Proof Suppose E is an open cube in R™*". Thus E is the product of an open cube 
in R” and an open cube in R". Hence E € By, © By. Thus the smallest c-algebra 
containing all the open cubes in R*” is contained in By, ® By. Now 5.38(b) implies 
that Bintn C By ® By. 

To prove the set inclusion in the other direction, temporarily fix an open set G in 
R”. Let 


E={ACR™:AXxGE Bruin}. 


Then € contains every open subset of R™ (as follows from 5.36). Also, € is closed 
under countable unions because 


(UA) xG= L (Ag x 6). 
k=1 


il 
mn 


Furthermore, € is closed under complementation because 
(R™\ A) x G = ((R™ x R")\ (Ax G)) 1(R™ x G). 


Thus € is a c-algebra on R™ that contains all open subsets of R™, which implies that 
Bm C E. In other words, we have proved that if A € By, and G is an open subset of 
R", then A x G € By iy. 

Now temporarily fix a Borel subset A of R”. Let 


F={BCR":AXBE Bruyn}. 


The conclusion of the previous paragraph shows that ¥ contains every open subset of 
R”. As in the previous paragraph, we also see that F is a o-algebra. Hence 6, C F. 
In other words, we have proved that if A € By, and B € By, then A x BE By yy. 
Thus By, ® By C By+n, completing the proof. 


The previous result implies a nice associative property. Specifically, if m,n, and 
p are positive integers, then two applications of 5.39 give 


(Bn ® By) & By = Bmntn (es) By => Bm+n+p- 


Similarly, two more applications of 5.39 give 


Bn ® (Bn ® By) = Bn & Bn+p = Bm+n+p- 


Thus (By ® Bn) ® By = Bm ® (Bn ® By); hence we can dispense with parentheses 
when taking products of more than two Borel o-algebras. More generally, we could 
have defined By, ® By ® By directly as the smallest 7-algebra on RT" *? containing 
{Ax BxC:A€ Bn,B € Bn,C € By} and obtained the same -algebra (see 
Exercise 3 in this section). 
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Lebesgue Measure on R” 
5.40 Definition Lebesgue measure; ry 


Lebesgue measure on R" is denoted by A», and is defined inductively by 


An = An-1 XM, 


where A, is Lebesgue measure on (R, 6;). 


Because 6, = B,_1 ® By (by 5.39), the measure A, is defined on the Borel 
subsets of R". Thinking of a typical point in R” as (x,y), where x € R”~! and 
y € R, we can use the definition of the product of two measures (5.25) to write 


An(E) = fo, fxe(%¥) aa (y) dn a(x) 


for E € B,. Of course, we could use Tonelli’s Theorem (5.28) to interchange the 
order of integration in the equation above. 

Because Lebesgue measure is the most commonly used measure, mathematicians 
often dispense with explicitly displaying the measure and just use a variable name. 
In other words, if no measure is explicitly displayed in an integral and the context 
indicates no other measure, then you should assume that the measure involved 
is Lebesgue measure in the appropriate dimension. For example, the result of 
interchanging the order of integration in the equation above could be written as 


~ I, = Xp(*, y) dx dy 


for E € By; here dx means dA,,_;(x) and dy means dA, (y). 

In the equations above giving formulas for A,,(E), the integral over R"~! could be 
rewritten as an iterated integral over R"~ and R, and that process could be repeated 
until reaching iterated integrals only over R. Tonelli’s Theorem could then be used 
repeatedly to swap the order of pairs of those integrated integrals, leading to iterated 
integrals in any order. 

Similar comments apply to integrating functions on R” other than characteristic 
functions. For example, if f: R? > Risa 3-measurable function such that either 
f >0or f, pal f|dA3 < ©, then by either Tonelli’s Theorem or Fubini’s Theorem we 
have 


PFOs= fff flerxz xa) dy deg dn 


where j,k, m is any permutation of 1,2,3. 

Although we defined A,, to be Ay,_1 < Ay, we could have defined A, to be Aj x Ax 
for any positive integers j,k with j +k = n. This potentially different definition 
would have led to the same o-algebra B,, (by 5.39) and to the same measure A, 
[because both potential definitions of A,,(E) can be written as identical iterations of 
n integrals with respect to A]. 
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Volume of Unit Ball in R” 


The proof of the next result provides good experience in working with the Lebesgue 
measure A,,. Recall that tE = {tx: x € E}. 


5.41 measure of a dilation 


Suppose ¢ > 0. If E € By, then tE € By and A, (tE) = t"An(E). 


Proof Let 
E = {E € By: tE € Bn}. 


Then € contains every open subset of R” (because if E is open in R” then tE is open 
in R”). Also, € is closed under complementation and countable unions because 


H(R"\E)=R"\(tE) and t(U Ex) = U (Ee). 
k=1 k=1 

Hence E is a c-algebra on R” containing the open subsets of R” Thus € = By. In 
other words, tE € By for all E € By. 

To prove A,,(tE) = t"A,,(E), first consider the case n = 1. Lebesgue measure on 
R is a restriction of outer measure. The outer measure of a set is determined by the 
sum of the lengths of countable collections of intervals whose union contains the set. 
Multiplying the set by ¢t corresponds to multiplying each such interval by t, which 
multiplies the length of each such interval by ft. In other words, Aj (fE) = tA, (E). 

Now assume 1 > 1. We will use induction on 1 and assume that the desired result 
holds forn — 1. If A € B,_1 and B € By, then 


An (t(A x B)) = An((tA) x (tB)) 
= An-1(tA) - Ax (¢B) 
= "1A, _1(A) - tA, (B) 
5.42 = t"An(A x B), 
giving the desired result for A x B. 


For m € Z*, let Cy be the open cube in R" centered at the origin and with side 
length m. Let 


m = {E € By: E C Cm and Ay(E) = t"An(E)}. 


From 5.42 and using 5.13(b), we see that finite unions of measurable rectangles 
contained in Cy are in €,. You should verify that €,, is closed under countable 
increasing unions (use 2.59) and countable decreasing intersections (use 2.60, whose 
finite measure condition holds because we are working inside C,,). From 5.13 and 
the Monotone Class Theorem (5.17), we conclude that €,, is the g-algebra on Cyy, 
consisting of Borel subsets of C,,. Thus Ay (tE) = t”An(E) for all E € By such that 
ECCn. 
Now suppose E € By, Then 2.59 implies that 


An(tE) = dim An(t(E q Cy) =" jim An(E a Cm) = t"An(E), 


as desired. 
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The open unit ball in R” is denoted by By, and is defined by 


Bn = {(x1,...,%n) SR ae ba aul 


The open unit ball B, is open in R” (as you should verify) and thus is in the 
collection B,, of Borel sets. 


5.44 volume of the unit ball in R” 


gt/2 
if n is even, 


(n/2)! 


if n is odd. 


Proof Because A;(B,) = 2 and A2(B2) = 7, the claimed formula is correct when 
n = 1 and when n = 2. 

Now assume that 1 > 2. We will use induction on 1, assuming that the claimed for- 
mula is true for smaller values of n. Think of R” = R? x R™~2 andAy = Az X An_2. 
Then 


5.45 n(Bn) =| feats 4 (x,y) dy dx. 


Temporarily fix x = (x1,x2) € R?. If x1 _— = 1, then x, (x,y) = 0 for 
all y € R"?. If xy? + x)? <1 andy € R"~?, then Xp, (x ¥) = 1if and only if 
y € (1 — x4? — xy”)!/?B,,_». Thus the inner integral in 5.45 equals 

An—a{ (1 — x1? — 29?)""Bu—2) xp, (%) 
which by 5.41 equals 
(i= x2 _ x97)(-2)2),_»(By—2) Xp, (2)- 


Thus 5.45 becomes the equation 


An (B,) = An—2(Bn—2) I, (1 _ xy > eye dAg(x1, x2). 


2 


To evaluate this integral, switch to the usual polar coordinates that you learned about 
in calculus (dAz = r dr dO), getting 


1 1 
An(Bn) = An-2(Bn-2) ff (1—12)"9)?r dr a9 
—T7T 


270 
= = An-2 (By-2). 


The last equation and the induction hypothesis give the desired result. 
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The table here gives the first five val- 
ues of A, (B,,), using 5.44. The last col- 
umn of this table gives a decimal approx- 
imation to A,,(B,,), accurate to two dig- 
its after the decimal point. From this ta- 
ble, you might guess that A,,(B,,) is an 
increasing function of n, especially be- 
cause the smallest cube containing the 
ball B,, has n-dimensional Lebesgue mea- 
sure 2”, However, Exercise 12 in this 
section shows that A;,(B,,) behaves much 
differently. 


Equality of Mixed Partial Derivatives Via Fubini’s Theorem 


5.46 Definition partial derivatives; D; f and D2 f 


Suppose G is an open subset of R? and f: G — Risa function. For (x,y) € G, 
the partial derivatives (D, f(x,y) and (D2f) (x,y) are defined by 


(Dif) (x,y) me acne he 


t>0 


(D2f)(x,y) 


if these limits exist. 


= lim 
t0 


flay +t) — f(xy) 
t 


Using the notation for the cross section of a function (see 5.7), we could write the 
definitions of D; and D2 in the following form: 


(Dif (x,y) = (LAIY)'(x) and (Daf)(x,y) = (Lflx)’(y)- 


5.47 Example partial derivatives of x4 
Let G = {(x,y) € R*: x > 0} and define f: G — R by f(x,y) = x¥. Then 


(Dif)(x,y) =yx¥! and (D2f)(x,y) = x¥ nx, 
as you should verify. Taking partial derivatives of those partial derivatives, we have 
(D2(Dif)) (x,y) = 91 + yx! nx 


and 
(Di (D2f)) (x,y) = af) yg ing, 


as you should also verify. The last two equations show that D;(D2f) = D2(D1f) 
as functions on G. 
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In the example above, the two mixed partial derivatives turn out to equal each 
other, even though the intermediate results look quite different. The next result shows 
that the behavior in the example above is typical rather than a coincidence. 

Some proofs of the result below do not use Fubini’s Theorem. However, Fubini’s 
Theorem leads to the clean proof below. 

The integrals that appear in the proof 
below make sense because continuous 
real-valued functions on R* are measur- 
able (because for a continuous function, 
the inverse image of each open set is open) 
and because continuous real-valued func- 
tions on closed bounded subsets of R? are 
bounded. 


Although the continuity hypotheses 
in the result below can be slightly 


weakened, they cannot be 
eliminated, as shown by Exercise 14 
in this section. 


5.48 equality of mixed partial derivatives 


Suppose G is an open subset of R? and f: G — Risa function such that D; f, 
D2 f, Di(D2f), and D2(Dj; f) all exist and are continuous functions on G. Then 


D,(D2f) = D2(Dif) 


Proof Fix (a,b) € G. For 6 > 0, let Ss = [a,a + 6] x [b,b +d]. If Ss C G, then 


b+6 pat+é 
7 Di(Dof) daa = | a (D1 (D2f)) (x,y) dx dy 


b+ 
= [oan a+8.y)~ Wap ay) 
= f(a+6,b+6) — f(a+6,b) — f(a,b+6) + f(a,b), 


where the first equality comes from Fubini’s Theorem (5.32) and the second and third 
equalities come from the Fundamental Theorem of Calculus. 
A similar calculation of [ 53 D2(Dj1f) dA yields the same result. Thus 


[, (Pilaf) — Dx(D; f)] ddz = 0 


for all 5 such that Ss C G. If (Di(D2f)) (a,b) > (D2(Dif)) (a,b), then by 
the continuity of D;(D2f) and D2(Djf), the integrand in the equation above is 
positive on S; for 6 sufficiently small, which contradicts the integral above equaling 
0. Similarly, the inequality (D;(D2f)) (a,b) < (D2(D1f)) (a,b) also contradicts 
the equation above for small 6. Thus we conclude that 


(D1 (Daf )) (a,b) = (D2(Dif)) (a,b), 


as desired. 
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EXERCISES 5C 


Show that a set G C R” is open in R” if and only if for each (b1,...,bn) € G, 
there exists r > 0 such that 


{ (a1,.++/an) ER": /(m by)? +-+++ (an bn)? <r} CG. 


Show that there exists a set E C R? (thinking of R* as equal to R x R) such 
that the cross sections [E], and {E]* are open subsets of R for every a € R, but 


E € Bo. 


Suppose (X,S), (Y,7), and (Z,U/) are measurable spaces. We can define 
S®T @U to be the smallest c-algebra on X x Y x Z that contains 


{AxBxC:AES,BET,CEU}. 


Prove that if we make the obvious identifications of the products (X x Y) x Z 
and X x (Y x Z) with X x Y x Z, then 


SOQTOQU=(SQT)QGU=SE(T OU). 


Show that Lebesgue measure on R” is translation invariant. More precisely, 
show that if E € B, anda € R", thena+E € B, and A,(a+ E) = A,(E), 
where 

a+E={a+x:x€ E}. 


Suppose f: R” — R is B,-measurable and ¢t € R \ {0}. Define f;: R’ > R 
by fi(x) = f (tx). 


(a) Prove that f; is By, measurable. 


(b) Prove that if | f dAn is defined, then 
R” 


1 
frdan = ae el Om 


R” 


Suppose A denotes Lebesgue measure on (R, £), where L is the c-algebra of 
Lebesgue measurable subsets of R. Show that there exist subsets E and F of R? 
such that 


e FEeLS@Land (A x A)(F) =0; 
e ECFbutE¢ LCL. 
[The measure space (R, £,A) has the property that every subset of a set with 


measure 0 is measurable. This exercise asks you to show that the measure space 
(R?, L£L@L,A x A) does not have this property.] 


Suppose M € Z*. Verify that the collection of sets €jy that appears in the proof 
of 5.41 is a monotone class. 


10 


11 


12 


13 


14 
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Show that the open unit ball in R” is an open subset of R”. 


Suppose G, is a nonempty subset of R” and G2 is a nonempty subset of R”. 
Prove that G; x G2 is an open subset of R” x R” if and only if Gj is an open 
subset of R” and Gp is an open subset of R”. 

[One direction of this result was already proved (see 5.36); both directions are 
stated here to make the result look prettier and to be comparable to the next 
exercise, where neither direction has been proved.] 


Suppose F; is a nonempty subset of R™ and F, is a nonempty subset of R”. 
Prove that F, x Fp is a closed subset of R” x R” if and only if F; is a closed 
subset of R™ and F> is a closed subset of R”. 


Suppose E is a subset of R” x R” and 
A= {x €R”™: (x,y) € E for some y € R"}. 


(a) Prove that if E is an open subset of R™ x R”, then A is an open subset 
of R™. 
(b) Prove or give a counterexample: If E is a closed subset of R™ x R”, then 
A is aclosed subset of R”. 
(a) Prove that limy—+ooAn(Bn) = 0. 
(b) Find the value of n that maximizes A,,(B,). 
For readers familiar with the gamma function I: Prove that 
pn/2 
An(Bn) = = 
eee Tea 


for every positive integer n. 
Define f: R? > R by 
xy(x?—-y*) 
= — if (x, 0,0), 
f(xy) = x2 + y2 if (x,y) # (0,0) 
0 if (x,y) = (0,0). 
(a) Prove that Dj (Df) and D2(Djf)) exist everywhere on R?. 


(b) Show that (Di (Dof)) (0,0) x (D2(D1f)) (0,0). 
(c) Explain why (b) does not violate 5.48. 
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Chapter 6 ey 
Banach Spaces 


We begin this chapter with a quick review of the essentials of metric spaces. Then 
we extend our results on measurable functions and integration to complex-valued 
functions. After that, we rapidly review the framework of vector spaces, which 
allows us to consider natural collections of measurable functions that are closed under 
addition and scalar multiplication. 

Normed vector spaces and Banach spaces, which are introduced in the third section 
of this chapter, play a hugely important role in modern analysis. Most interest focuses 
on linear maps on these vector spaces. Key results about linear maps that we develop 
in this chapter include the Hahn—Banach Theorem, the Open Mapping Theorem, the 
Closed Graph Theorem, and the Principle of Uniform Boundedness. 


Market square in Lwow, a city that has been in several countries because of changing 
international boundaries. Before World War I, Lwoéw was in Austria—Hungary. 
During the period between World War I and World War IT, Lwéw was in Poland. 
During this time, mathematicians in Lwow, particularly Stefan Banach (1892-1945) 
and his colleagues, developed the basic results of modern functional analysis. After 
World War IT, Lw6w was in the USSR. Now Lwéw is in Ukraine and is called Lviv. 
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6A Metric Spaces 


Open Sets, Closed Sets, and Continuity 


Much of analysis takes place in the context of a metric space, which is a set with a 
notion of distance that satisfies certain properties. The properties we would like a 
distance function to have are captured in the next definition, where you should think 
of d(f, @) as measuring the distance between f and g. 

Specifically, we would like the distance between two elements of our metric space 
to be a nonnegative number that is 0 if and only if the two elements are the same. We 
would like the distance between two elements not to depend on the order in which 
we list them. Finally, we would like a triangle inequality (the last bullet point below), 
which states that the distance between two elements is less than or equal to the sum 
of the distances obtained when we insert an intermediate element. 

Now we are ready for the formal definition. 


6.1. Definition metric space 


A metric on a nonempty set V is a function d: V x V —> [0,00) such that 


e d(f,f) =O forall f € V; 


e if f,g © Vandd(f,g) =0, then f = g; 
e d(f,g) =d(g,f) forall f,g € V; 
e d(f,h) <d(f,g) +d(g,h) for all f,¢,h € V. 


A metric space is a pair (V,d), where V is a nonempty set and d is a metric on V. 


6.2 Example metric spaces 


Suppose V is a nonempty set. Define d on V x V by setting d(f, 2) to be 1 if 
f A gand to be Oif f = g. Then d is a metric on V. 


e Define don R x R by d(x, y) = |x — y|. Then d is a metric on R. 
e Forn € Z*, define d on R" x R" by 


d((x1,.--,Xn),(Y1,--+,Yn)) = max{|x1 — yi|,---, [Xn — Yn|f- 
Then d is a metric on R”. 
Define d on C([0,1]) x C((0, 1]) by d(f,g) = supf]f(#) — g(t)| :# € (0, 11}; 


here C([0, 1]) is the set of continuous real-valued functions on [0,1]. Then d is 
a metric on C((0,1]). 


e Define d on ¢! x £1 by d((a1,42,...), (b1,b2,--.)) = Dy |an — dg; here ¢ 
is the set of sequences (41, a2,...) of real numbers such that )°7° ,|a| < 0. 
Then d is a metric on ¢'. 
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The material in this section is proba- 
bly review for most readers of this book. 
Thus more details than usual are left to the 
reader to verify. Verifying those details 
and doing the exercises is the best way 
to solidify your understanding of these 
concepts. You should be able to transfer 
familiar definitions and proofs from the 
context of R or R” to the context of a metric space. 

We will need to use a metric space’s topological features, which we introduce 
now. 


This book often uses symbols such as 
f,g,h as generic elements of a 
generic metric space because many 


of the important metric spaces in 
analysis are sets of functions; for 
example, see the fourth bullet point 
of Example 6.2. 


6.3. Definition open ball; B(f,r) 


Suppose (V,d) is a metric space, f € V, andr > 0. 


e The open ball centered at f with radius r is denoted B(f,1) and is defined 


by 
BU) =18 eV af.g) <7}. 


e The closed ball centered at f with radius r is denoted B(f,1) and is defined 


by = 
BU) =e at ear 


Abusing terminology, many books (including this one) include phrases such as 
suppose V is a metric space without mentioning the metric d. When that happens, 
you should assume that a metric d lurks nearby, even if it is not explicitly named. 

Our next definition declares a subset of a metric space to be open if every element 
in the subset is the center of an open ball that is contained in the set. 


6.4 Definition open set 


A subset G of a metric space V is called open if for every f € G, there exists 
r > Osuch that B(f,r) C G. 


6.5 open balls are open 


Suppose V is a metric space, f € V, andr > 0. Then B(f,r) is an open subset 
of V. 


Proof Suppose g € B(f,r). We need to show that an open ball centered at ¢ is 
contained in B(f,1). To do this, note that if h € B(g,r _ d(f,g)), then 


d(f,h) <d(f,g) +d(g,h) <d(f,g) + (r—d(f.g)) =r 


which implies that h € B(f,r). Thus B(g,r —d(f,g)) C B(f,r), which implies 
that B(f,1) is open. 


Section6A Metric Spaces 149 


Closed sets are defined in terms of open sets. 


For example, each closed ball B(f,r) in a metric space is closed, as you are asked 
to prove in Exercise 3. 
Now we define the closure of a subset of a metric space. 


Suppose V is a metric space and E C V. The closure of E, denoted E, is defined 
by 
E={geV: B(g,e) NE £ @ for every € > 0}. 


Limits in a metric space are defined by reducing to the context of real numbers, 
where limits have already been defined. 


Suppose (V,d) is a metric space, fi, f2,... is a sequence in V, and f € V. Then 


lim f; = f means jim afheti—0. 
— 00 


k- co 


In other words, a sequence f, f2,... in V converges to f € V if for every e > 0, 
there exists 1 € Z* such that 


d( fx, f) < € for all integers k > n. 


The next result states that the closure of a set is the collection of all limits of 
elements of the set. Also, a set is closed if and only if it equals its closure. The proof 
of the next result is left as an exercise that provides good practice in using these 
concepts. 


Suppose V is a metric space and E C V. Then 

(a) E={ge€V: there exist f, fo,... in E such that jim j= 81: 
—7 00 

(b) E is the intersection of all closed subsets of V that contain E; 


(c) E is aclosed subset of V; 


(d) E is closed if and only if E = E; 


(e) E is closed if and only if E contains the limit of every convergent sequence 
of elements of E. 
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The definition of continuity that follows uses the same pattern as the definition for 
a function from a subset of R to R. 


6.10 Definition continuity 


Suppose (V, dy) and (W, dy) are metric spaces and T: V — W isa function. 


e For f € V, the function T is called continuous at f if for every ¢ > 0, there 
exists 0 > 0 such that 


dw(T(f),T(g)) <e 
for all g € V with dy(f,g) < 6. 


e The function T is called continuous if T is continuous at f for every f € V. 


The next result gives equivalent conditions for continuity. Recall that T~! (E) is 
called the inverse image of E and is defined to be {f € V : T(f) € E}. Thus the 
equivalence of the (a) and (c) below could be restated as saying that a function is 
continuous if and only if the inverse image of every open set is open. The equivalence 
of the (a) and (d) below could be restated as saying that a function is continuous if 
and only if the inverse image of every closed set is closed. 


6.11 equivalent conditions for continuity 


Suppose V and W are metric spaces and T: V — W isa function. Then the 
following are equivalent: 


(a) T is continuous. 


(b) lim f, = f in V implies lim T(f,) = T(f) in W. 
k-o0 k—4o0 


(c) T~!(G) is an open subset of V for every open set G C W. 


(d) T~!(F) is a closed subset of V for every closed set F C W. 


Proof We first prove that (b) implies (d). Suppose (b) holds. Suppose F is a closed 
subset of W. We need to prove that T~!(F) is closed. To do this, suppose fir far->- 
is a sequence in T~!(F) and limg_+oo fe = f for some f € V. Because (b) holds, we 
know that limy_oo T (fx) = T(f). Because f, € T~!(F) for each k € Z*, we know 
that T(f,) € F for each k € Z*. Because F is closed, this implies that T(f) € F. 
Thus f € T~!(F), which implies that T~!(F) is closed [by 6.9(e)], completing the 
proof that (b) implies (d). 
The proof that (c) and (d) are equivalent follows from the equation 


T(W\ E) =V\ TE) 


for every E C W and the fact that a set is open if and only if its complement (in the 
appropriate metric space) is closed. 

The proof of the remaining parts of this result are left as an exercise that should 
help strengthen your understanding of these concepts. 
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Cauchy Sequences and Completeness 


The next definition is useful for showing (in some metric spaces) that a sequence has 
a limit, even when we do not have a good candidate for that limit. 


A sequence f}, f2,...in a metric space (V, d) is called a Cauchy sequence if for 
every ¢ > 0, there exists 1 € Z* such that d(fj, fx.) < € for all integers j > n 
andk > n. 


Proof Suppose lim, of, = f in a metric space (V,d). Suppose e > 0. Then 
there exists n € Z* such that d( fy, f) < § for all k > n. If j,k € Z* are such that 
j >nandk > n, then 


A(fir fr) < Afi fl +f fe) <§ +5 =e 


Thus f1, f2,...is a Cauchy sequence, completing the proof. 


Metric spaces that satisfy the converse of the result above have a special name. 


5.14 Definition 


A metric space V is called complete if every Cauchy sequence in V converges to 
some element of V. 


6.15 Example 
e All five of the metric spaces in Example 6.2 are complete, as you should verify. 


e The metric space Q, with metric defined by d(x,y) = |x — y|, is not complete. 
To see this, fork € Z* let 


eA oil 
*e= Fou 792 TT 19H 
If j < k, then 
|x, — x;| = : be < : 
kT a9 GD! | "70K > 49040!" 


Thus x1, X2,...is a Cauchy sequence in Q. However, x1, X2,... does not con- 
verge to an element of Q because the limit of this sequence would have a decimal 
expansion 0.110001000000000000000001 .. . that is neither a terminating deci- 
mal nor a repeating decimal. Thus Q is not a complete metric space. 
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Entrance to the Ecole Polytechnique (Paris), where Augustin-Louis Cauchy 
(1789-1857) was a student and a faculty member. Cauchy wrote almost 800 
mathematics papers and the highly influential textbook Cours d’ Analyse (published 
in 1821), which greatly influenced the development of analysis. 
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Every nonempty subset of a metric space is a metric space. Specifically, suppose 
(V,d) is a metric space and U is a nonempty subset of V. Then restricting d to 
U x U gives a metric on U. Unless stated otherwise, you should assume that the 
metric on a subset is this restricted metric that the subset inherits from the bigger set. 

Combining the two bullet points in the result below shows that a subset of a 
complete metric space is complete if and only if it is closed. 


6.16 connection between complete and closed 


(a) A complete subset of a metric space is closed. 


(b) A closed subset of a complete metric space is complete. 


Proof We begin with a proof of (a). Suppose U is a complete subset of a metric 
space V. Suppose fj, f2,... is a sequence in U that converges to some g € V. 
Then f1, fo,... is a Cauchy sequence in U (by 6.13). Hence by the completeness 
of U, the sequence fy, f2,... converges to some element of U, which must be g 
(see Exercise 7). Hence g € U. Now 6.9(e) implies that U is a closed subset of V, 
completing the proof of (a). 

To prove (b), suppose UI is a closed subset of a complete metric space V. To show 
that U is complete, suppose f;, f2,...is a Cauchy sequence in U. Then fy, f2,... is 
also a Cauchy sequence in V. By the completeness of V, this sequence converges to 
some f € V. Because U is closed, this implies that f € U (see 6.9). Thus the Cauchy 
sequence f1, fo,... converges to an element of U, showing that U is complete. Hence 
(b) has been proved. 
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EXERCISES 6A 


-_ BY NY 


10 


11 


12 


13 


Verify that each of the claimed metrics in Example 6.2 is indeed a metric. 
Prove that every finite subset of a metric space is closed. 

Prove that every closed ball in a metric space is closed. 

Suppose V is a metric space. 


(a) Prove that the union of each collection of open subsets of V is an open 
subset of V. 


(b) Prove that the intersection of each finite collection of open subsets of V is 
an open subset of V. 


Suppose V is a metric space. 
(a) Prove that the intersection of each collection of closed subsets of V is a 
closed subset of V. 


(b) Prove that the union of each finite collection of closed subsets of V is a 
closed subset of V. 


(a) Prove that if V is a metric space, f € V, andr > 0, then B(f,r) C B(f,r). 
(b) Give an example of a metric space V, f € V, andr > O such that 


B(f.r) # BUF). 
Show that a sequence in a metric space has at most one limit. 
Prove 6.9. 


Prove that each open subset of a metric space V is the union of some sequence 
of closed subsets of V. 


Prove or give a counterexample: If V is a metric space and U, W are subsets 
of V, then UUW =UUW. 


Prove or give a counterexample: If V is a metric space and U, W are subsets 
of V, then UNW=UNW. 


Suppose (U, di), (V, dy), and (W, dy) are metric spaces. Suppose also that 
T: U— Vand S: V — W are continuous functions. 
(a) Using the definition of continuity, show that S o T: U — W is continuous. 


(b) Using the equivalence of 6.11(a) and 6.11(b), show that SoT: U > Wis 
continuous. 


(c) Using the equivalence of 6.11(a) and 6.11(c), show that SoT: U > Wis 
continuous. 


Prove the parts of 6.11 that were not proved in the text. 
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Suppose a Cauchy sequence in a metric space has a convergent subsequence. 
Prove that the Cauchy sequence converges. 


Verify that all five of the metric spaces in Example 6.2 are complete metric 
spaces. 


Suppose (U, d) is a metric space. Let W denote the set of all Cauchy sequences 
of elements of U. 


(a) For (f1, fo,...) and (g1,92,...) in W, define (fj, fo,.-.) = (91,82,---) 


to mean that 
lim A( fk Xk) =0. 
k-00 

Show that = is an equivalence relation on W. 


(b) Let V denote the set of equivalence classes of elements of W under the 
equivalence relation above. For (f1, f2,...) € W, let (f1, f2,...)* denote 
the equivalence class of (1, f2,...). Define dy: V x V — [0,00) by 


dy (fir far--J% (91,82r-+)°) = jim 4 (fer 8k). 


Show that this definition of dy makes sense and that dy is a metric on V. 
(c) Show that (V, dy) is a complete metric space. 


(d) Show that the map from U to V that takes f € U to (f, f, f,...)° preserves 
distances, meaning that 


df) Sav fifi fie YAS S-)) 
for all f,g € U. 


(e) Explain why (d) shows that every metric space is a subset of some complete 
metric space. 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 
4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial 
use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give 
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license 
and indicate if changes were made. 


The images or other third party material in this chapter are included in the chapter’s Creative 


Commons license, unless indicated otherwise in a credit line to the material. If material is not included 
in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation 
or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. 


Section6B Vector Spaces 155 


6B Vector Spaces 


Integration of Complex-Valued Functions 


Complex numbers were invented so that we can take square roots of negative numbers. 
The idea is to assume we have a square root of —1, denoted 7, that obeys the usual 
rules of arithmetic. Here are the formal definitions: 


6.17 Definition complex numbers; C 


e A complex number is an ordered pair (a,b), where a,b € R, but we write 
this as a + bi. 


e The set of all complex numbers is denoted by C: 


C= {a+bi:a,be R}. 


e Addition and multiplication on C are defined by 


(a+ bi) +(c+di) =(a+c)+(b+d)i, 
(a+ bi)(c+di) = (ac — bd) + (ad + bc)i; 


here a,b,c,d € R. 


If a € R, then we identify a+ 0i 
with a. Thus we think of R as a subset of 
C. We also usually write 0 + bi as bi, and 
we usually write 0 + 1i as i. You should 
verify that i2 = —1. 

With the definitions as above, C satisfies the usual rules of arithmetic. Specifically, 
with addition and multiplication defined as above, C is a field, as you should verify. 
Thus subtraction and division of complex numbers are defined as in any field. 


The field C cannot be made into an or- ; ; : 
Much of this section may be review 
dered field. However, the useful concept for many readers 
of an absolute value can still be defined . : 


onC. 


The symbol i was first used to denote 
/ —1 by Leonhard Euler 
(1707-1783) in 1777. 


6.18 Definition Rez; Imz; absolute value; limits 
Suppose z = a+ bi, where a and b are real numbers. 
e The real part of z, denoted Re z, is defined by Rez = a. 
e The imaginary part of z, denoted Im z, is defined by Imz = b. 


e The absolute value of z, denoted |z|, is defined by |z| = Va? + b?. 


e If z1,Z2,...€ Cand L € C, then lim z, = L means lim |z, — L| = 0. 
k-00 k—-s00 
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For b areal number, the usual definition of |b| as a real number is consistent with 
the new definition just given of |b| with b thought of as a complex number. Note that 
if Z1,Z2,... iS a sequence of complex numbers and L € C, then 

lim z= L <== lim Rez = ReLand lim Imz;, = Im L. 
ko k- 00 k-oo 

We will reduce questions concerning measurability and integration of a complex- 
valued function to the corresponding questions about the real and imaginary parts of 
the function. We begin this process with the following definition. 


6.19 Definition measurable complex-valued function 


Suppose (X,S) is a measurable space. A function f: X — C is called 
S-measurable if Re f and Im f are both S-measurable functions. 


See Exercise 5 in this section for two natural conditions that are equivalent to 
measurability for complex-valued functions. 

We will make frequent use of the following result. See Exercise 6 in this section 
for algebraic combinations of complex-valued measurable functions. 


6.20 |f|P is measurable if f is measurable 


Suppose (X,S) is a measurable space, f: X — C is an S-measurable function, 
and 0 < p < ov. Then |f|? is an S-measurable function. 


Proof The functions (Re f)* and (Im f)? are S-measurable because the square 
of an S-measurable function is measurable (by Example 2.45). Thus the function 
(Re f)? + (Im f)* is S-measurable (because the sum of two S-measurable functions 


is S-measurable by 2.46). Now ((Re f)* + (Im f)2)P/? is S-measurable because it 


is the composition of a continuous function on [0, 00) and an S-measurable function 
(see 2.44 and 2.41). In other words, | f|? is an S-measurable function. 


Now we define integration of a complex-valued function by separating the function 
into its real and imaginary parts. 


6.21 Definition integral of a complex-valued function 


Suppose (X,S,}1) is a measure space and f: X —> C is an S-measurable 
function with || f| dj < 0 [the collection of such functions is denoted £1(j:)]. 


Then | f dy is defined by 
[fan = [ (Bef) du+i | (mf) dy. 


The integral of a complex-valued measurable function is defined above only when 
the absolute value of the function has a finite integral. In contrast, the integral of 
every nonnegative measurable function is defined (although the value may be oo), 
and if f is real valued then [ f dy: is defined to be [ f+ du — [ f~ dif at least one 
of f f* du and f{ f~ dy is finite. 
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You can easily show that if f,¢g: X —> C are S-measurable functions such that 
J\fl du < coand |g] du < oo, then 


[u+san=f fant f gap. 


Similarly, the definition of complex multiplication leads to the conclusion that 


jor du=« [fa 
for all w € C (see Exercise 8). 


The inequality in the result below concerning integration of complex-valued 
functions does not follow immediately from the corresponding result for real-valued 
functions. However, the small trick used in the proof below does give a reasonably 
simple proof. 


6.22 bound on the absolute value of an integral 


Suppose (X,S,}1) is a measure space and f: X — C is an S-measurable 


function such that [| f| djs < co. Then 


| [ fan] < [flan 


Proof The result clearly holds if { f dj = 0. Thus assume that [ f du A 0. 


Let 
_ WS fae 
J f du 


|| faul=a f fou =f afan 


= [ Re(af) au+i f im(af) ay 


ax 


Then 


where the second equality holds by Exercise 8, the fourth equality holds because 
|[ f du| © R, the inequality on the fourth line holds because Rez < |z| for every 
complex number z, and the equality in the last line holds because |«| = 1. 


Because of the result above, the Bounded Convergence Theorem (3.26) and the 
Dominated Convergence Theorem (3.31) hold if the functions f;, fo,... and f in the 
statements of those theorems are allowed to be complex valued. 
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We now define the complex conjugate of a complex number. 


6.23 Definition complex conjugate; Z 


Suppose z € C. The complex conjugate of z € C, denoted z (pronounced z-bar), 
is defined by 
Z = Rez — (Imz)i. 


For example, if z = 5+ 7i then z = 5 — 71. Note that a complex number Z is a 
real number if and only if z = Z. 
The next result gives basic properties of the complex conjugate. 


6.24 properties of complex conjugates 


Suppose w,z € C. Then 
e product of z and z 
te = lel 
sum and difference of z and Z 
Z+Z =2Rez and z—Z = 2(Imz)i; 


additivity and multiplicativity of complex conjugate 


wtz=w+zZand wz = WZ; 


complex conjugate of complex conjugate 


Ze 

absolute value of complex conjugate 
Z| = |z|; 

integral of complex conjugate of a function 


[Fan = [fen for every measure js and every f € L(1). 


Proof The first item holds because 
zZ = (Rez +ilmz)(Rez—ilmz) = (Rez)? + (Imz)? = |z/?. 


To prove the last item, suppose p/ is a measure and f € £!(y). Then 


[Fau= [Ref —itmf) du = [Re fdy—i [ tm fap 


= [Re fan +i [ Im f dy 


7 i f du, 
as desired. 


The straightforward proofs of the remaining items are left to the reader. 
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Vector Spaces and Subspaces 


The structure and language of vector spaces will help us focus on certain features of 
collections of measurable functions. So that we can conveniently make definitions 
and prove theorems that apply to both real and complex numbers, we adopt the 
following notation. 


6.25 Definition F 


From now on, F stands for either R or C. 


In the definitions that follow, we use f and g to denote elements of V because in 
the crucial examples the elements of V are functions from a set X to F. 


6.26 Definition addition; scalar multiplication 


e An addition on a set V is a function that assigns an element f + ¢ € V to 
each pair of elements f, ¢ € V. 


e A scalar multiplication on a set V is a function that assigns an element 
af € V toeach a € F and each f € V. 


Now we are ready to give the formal definition of a vector space. 


6.27 Definition vector space 


A vector space (over F) is a set V along with an addition on V and a scalar 
multiplication on V such that the following properties hold: 


commutativity 

ftg=gt+f forall f,g € V; 

associativity 

(f+g)+h=f+(g+h) and («B)f = «(Bf) forall f,g,h € Vanda, B € F; 


additive identity 
there exists an element 0 € V such that f +0 = f forall f € V; 


additive inverse 
for every f € V, there exists g € V such that f + g¢ = 0; 


multiplicative identity 
1f =f forall f € V; 


distributive properties 
a(f +9) =a«f + ag and (a+ B)f = af + Bf forall x, 6 € F and f,g € V. 


Most vector spaces that you will encounter are subsets of the vector space F* 
presented in the next example. 
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6.28 Example the vector space F* 


Suppose X is a nonempty set. Let F* denote the set of functions from X to F. 
Addition and scalar multiplication on F* are defined as expected: for f, ¢ € F* and 
a € F, define 


(f + 9)(x) = f(x) +(x) and (af)(x) = a(f(x)) 


for x € X. Then, as you should verify, F* is a vector space; the additive identity in 
this vector space is the function 0 € F* defined by 0(x) = 0 for all x € X. 


6.29 Example F"; FZ" 

Special case of the previous example: ifn € Zt and X = {1,...,n}, then FX is 
the familiar space R” or C”, depending upon whether F = R or F = C. 

Another special case: F2" is the vector space of all sequences of real numbers or 
complex numbers, again depending upon whether F = R or F= C. 


By considering subspaces, we can greatly expand our examples of vector spaces. 


6.30 Definition subspace 


A subset U of V is called a subspace of V if U is also a vector space (using the 
same addition and scalar multiplication as on V). 


The next result gives the easiest way to check whether a subset of a vector space 
is a subspace. 


6.31 conditions for a subspace 


A subset U of V is a subspace of V if and only if U satisfies the following three 
conditions: 


e additive identity 
OE U; 


e closed under addition 
f,g € Uimplies f + ¢ € U; 


e closed under scalar multiplication 
« € Fand f € U implies af € U. 


Proof If U is a subspace of V, then U satisfies the three conditions above by the 
definition of vector space. 

Conversely, suppose U satisfies the three conditions above. The first condition 
above ensures that the additive identity of V is in U. 

The second condition above ensures that addition makes sense on U. The third 
condition ensures that scalar multiplication makes sense on U. 
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If f € V, then Of = (0+ 0)f =Of + Of. Adding the additive inverse of Of to 
both sides of this equation shows that 0f = 0. Now if f € U, then (—1)f is also in 
U by the third condition above. Because f + (—1)f = (1+ (—1))f =0f =0, we 
see that (—1)f is an additive inverse of f. Hence every element of U has an additive 
inverse in U. 

The other parts of the definition of a vector space, such as associativity and 
commutativity, are automatically satisfied for WU because they hold on the larger 
space V. Thus U is a vector space and hence is a subspace of V. 


The three conditions in 6.31 usually enable us to determine quickly whether a 
given subset of V is a subspace of V, as illustrated below. All the examples below 
except for the first bullet point involve concepts from measure theory. 


6.32 Example subspaces of F* 


e The set C((0, 1]) of continuous real-valued functions on [0,1] is a vector space 
over R because the sum of two continuous functions is continuous and a constant 
multiple of a continuous functions is continuous. In other words, C((0,1]}) is a 
subspace of ROA), 


Suppose (X,S) is a measurable space. Then the set of S-measurable functions 
from X to F is a subspace of F* because the sum of two S-measurable functions 
is S-measurable and a constant multiple of an S-measurable function is S- 
measurable. 


Suppose (X,S,/) is a measure space. Then the set Z(j) of S-measurable 
functions f from X to F such that f = 0 almost everywhere [meaning that 
u({x € X: f(x) 4 0}) = 0] is a vector space over F because the union of 
two sets with ji-measure 0 is a set with -measure 0 [which implies that Z(j/) 
is closed under addition]. Note that Z(j) is a subspace of F*. 


Suppose (X,S) is a measurable space. Then the set of bounded measurable 
functions from X to F is a subspace of F* because the sum of two bounded 
S-measurable functions is a bounded S-measurable function and a constant mul- 
tiple of a bounded S-measurable function is a bounded S-measurable function. 


e Suppose (X,S, 1) is a measure space. Then the set of S-measurable functions 
f from X to F such that [ f du = 0 is a subspace of FX because of standard 
properties of integration. 


Suppose (X,S, 1) is a measure space. Then the set £!(j) of S-measurable 
functions from X to F such that ['|f| dj < co is a subspace of FX [we are now 
redefining L+(y:) to allow for the possibility that F = R or F = C]. The set 
£1() is closed under addition and scalar multiplication because ['|f + g| du < 


S\fldut flgldwand flaf| dye = |a| ffl dy. 


The set ¢! of all sequences (a1, a2,...) of elements of F such that 1-72; |az| < 00 


is a subspace of FZ". Note that @1 is a special case of the example in the previous 
bullet point (take jz to be counting measure on Z*). 
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EXERCISES 6B 


1 


10 


Show that if a,b € R with a + bi £ 0, then 
1 =a b F 
at+bi a@+b% a?+b20 
Suppose z € C. Prove that 
max{|Rez|, |Imz|} < |z| < V2max{|Rez|, |Imz|}. 


|Re z| + |Imz| 


Suppose z € C. Prove that < |z| < |Rez| + |Imz|. 


Suppose w,z € C. Prove that |wz| = |w| |z| and |w +z] < |w| + |z]. 


Suppose (X,S) is a measurable space and f: X — C is a complex-valued 
function. For conditions (b) and (c) below, identify C with R?. Prove that the 
following are equivalent: 


(a) f is S-measurable. 
(b) f-1(G) € S for every open set G in R?. 
(c) f~!(B) € S for every Borel set B € Bo. 


Suppose (X,S) is a measurable space and f,g: X — C are S-measurable. 
Prove that 


(a) f +9, f — g, and fg are S-measurable functions; 
(b) if g(x) #0 for all x € X, then f is an S-measurable function. 
Suppose (X,S) is a measurable space and fy, f2,... is a sequence of S- 
measurable functions from X to C. Suppose jim f(x) exists for each x € X. 
— 00 
Define f: X — C by 
f(x) = lim f(x). 
k-400 
Prove that f is an S-measurable function. 


Suppose (X,S,) is a measure space and f: X — C is an S-measurable 
function such that ['|f| dj < oo. Prove that if  € C, then 


fafau=a | fap. 


Suppose V is a vector space. Show that the intersection of every collection of 
subspaces of V is a subspace of V. 


Suppose V and W are vector spaces. Define V x W by 
VxW=({(f,g):f © Vandg € Wh. 
Define addition and scalar multiplication on V x W by 


(fi, 81) + (for 82) = (ft + for8it+g2) and a(f,g) = (af,ag). 


Prove that V x W is a vector space with these operations. 
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6C Normed Vector Spaces 


Norms and Complete Norms 


This section begins with a crucial definition. 
6.33 Definition norm; normed vector space 
A norm on a vector space V (over F) is a function ||-||: V — [0, 00) such that 
e ||f|| = 0 if and only if f = 0 (positive definite); 
e ||af|| = |x| ||f|| for all a € Fand f € V (homogeneity); 


e if +ell < ILFIl + |lg|| for all f,¢ € V (triangle inequality). 


A normed vector space is a pair (V, ||-||), where V is a vector space and ||-|| is a 
norm on V. 


6.34 Example norms 
e Suppose 1 € Z*. Define ||-||1 and ||-||.0 on F” by 
|| (41, - . -4n)|I1 a |a4| + tae + |an| 


and 
|| (41,..+,4n) leo = max{|ay|,...,|an|}. 


Then ||-||; and ||-||.. are norms on F", as you should verify. 


e On /! (see the last bullet point in Example 6.32 for the definition of ¢'), define 
I|- [1 by 


CO 
I|(@1,42,.--)||1 = So lax. 
kel 


Then ||-||; is a norm on £!, as you should verify. 


e Suppose X is a nonempty set and b(X) is the subspace of FX consisting of the 
bounded functions from X to F. For f a bounded function from X to F, define 


[fll by 
IIFll = sup{|f(x)| =x € X}. 
Then |]-|| is anorm on b(X), as you should verify. 


e Let C([0,1]) denote the vector space of continuous functions from the interval 
[0,1] to F. Define ||-|| on C([0, 1]) by 


1 
fl = [Ue 


Then ||-|| is a norm on C((0,1]), as you should verify. 
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Sometimes examples that do not satisfy a definition help you gain understanding. 


6.35 Example not norms 


e Let £'(R) denote the vector space of Borel (or Lebesgue) measurable functions 
f: R— F such that ['|f| dA < 00, where A is Lebesgue measure on R. Define 
|-l|1 on £*(R) by 


fll = [Iftar 


Then ||-||1 satisfies the homogeneity condition and the triangle inequality on 
£1(R), as you should verify. However, ||-||; is not a norm on £!(R) because 
the positive definite condition is not satisfied. Specifically, if E is a nonempty 
Borel subset of R with Lebesgue measure 0 (for example, E might consist of a 
single element of R), then ||x,||1 = 0 but x, 4 0. In the next chapter, we will 


discuss a modification of £!(R) that removes this problem. 


Ifn € Z* and ||-|| is defined on F” by 
\|(a1,--+,4n)|| = |ar|t/2 +--+ fan|t/, 


then ||-|| satisfies the positive definite condition and the triangle inequality (as 
you should verify). However, ||-|| as defined above is not a norm because it does 
not satisfy the homogeneity condition. 


e If ||-||1/2 is defined on F” by 


2 
l|(a1,.- an) |lay2 = (Jan|/? +++ + fan|”7)°, 


then ||-||1/2 satisfies the positive definite condition and the homogeneity condi- 
tion. However, if n > 1 then ||-||;/2 is not a norm on F” because the triangle 
inequality is not satisfied (as you should verify). 


The next result shows that every normed vector space is also a metric space in a 
natural fashion. 


6.36 normed vector spaces are metric spaces 


Suppose (V, ||-||) is a normed vector space. Define d: V x V — [0,00) by 


d(f,g) =|lf —gll- 


Then d is a metric on V. 


Proof Suppose f,g, € V. Then 
a(f,h) = ||f—hll =F—s) + (8 —A)ll 
<IIlf-sll+lg—-ll 
=d(f,g) +d(g,h). 


Thus the triangle inequality requirement for a metric is satisfied. The verification of 
the other required properties for a metric are left to the reader. 
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From now on, all metric space notions in the context of a normed vector space 
should be interpreted with respect to the metric introduced in the previous result. 
However, usually there is no need to introduce the metric d explicitly—just use the 
norm of the difference of two elements. For example, suppose (V, ||-||) is a normed 
vector space, f1, f2,... is a sequence in V, and f € V. Then in the context of a 
normed vector space, the definition of limit (6.8) becomes the following statement: 


lim f, = f means lim || f, — f|| = 0. 
k00 ko 


As another example, in the context of a normed vector space, the definition of a 
Cauchy sequence (6.12) becomes the following statement: 


A sequence fj, fo,... in a normed vector space (V, ||-||) is a Cauchy se- 
quence if for every ¢ > 0, there exists n € Z* such that || f; — f|| < e for 
all integers j > nandk > n. 


Every sequence in a normed vector space that has a limit is a Cauchy sequence 
(see 6.13). Normed vector spaces that satisfy the converse have a special name. 


6.37. Definition Banach space 


A complete normed vector space is called a Banach space. 


In other words, a normed vector space 
V is a Banach space if every Cauchy se- 
quence in V converges to some element 
of V. 

The verifications of the assertions in 
Examples 6.38 and 6.39 below are left to 
the reader as exercises. 


Ina slight abuse of terminology, we 
often refer to a normed vector space 
V without mentioning the norm ||-||. 


When that happens, you should 
assume that a norm ||-|| lurks nearby, 
even if it is not explicitly displayed. 


6.38 Example Banach spaces 


e The vector space C([0,1]) with the norm defined by || f|| = sup|f| is a Banach 
space. [0,1 


e The vector space ¢! with the norm defined by ||(a1,42,...)|l1 = Ly24 lax] is a 
Banach space. 


6.39 Example not a Banach space 


e The vector space C({0,1]) with the norm defined by ||f|| = Ie |f| is not a 
Banach space. 


e The vector space ¢! with the norm defined by ||(a1,42,...)||oo = sup |a,| is 
not a Banach space. keZ+ 
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6.40 Definition infinite sum in a normed vector space 


Suppose 91, 2,... iS a sequence in a normed vector space V. Then }77°_, 9, is 
defined by 


n-oo 


ye = lim ) a 
k=l k= 


if this limit exists, in which case the infinite series is said to converge. 


Recall from your calculus course that if a1, a,... is a sequence of real numbers 
such that °°, |a;| < 00, then )7??., a; converges. The next result states that the 
analogous property for normed vector spaces characterizes Banach spaces. 


6.41 (TR llsel 200i) er converges ) <= Banach space 


Suppose V is a normed vector space. Then V is a Banach space if and only if 
Vee Sk converges for every sequence 91, 9,... in V such that )°° | || ¢%|| < 0°. 


Proof First suppose V is a Banach space. Suppose @1, 82, ... 18 a sequence in V such 
that )°7°_4||gx|] < co. Suppose e > 0. Let n € Z* be such that Dr _,,||gm||_ < €. 
For j € Z*, let fj denote the partial sum defined by 


Ifk > j > n, then 


If — Fill = lgjta +++ + gell 
Igjtall +--+ + [gel 


foe) 
< DU Mise 
m=n 


<€. 


IA 


Thus f;, f2,... is a Cauchy sequence in V. Because V is a Banach space, we conclude 
that f1, fo,... converges to some element of V, which is precisely what it means for 
Ve p-1 &k to converge, completing one direction of the proof. 

To prove the other direction, suppose )(7°_, 8, converges for every sequence 
81, 82,-..in V such that )°7° 5 || ¢4|| < oo. Suppose fi, fo,... is a Cauchy sequence 
in V. We want to prove that f1, fo,... converges to some element of V. It suffices to 
show that some subsequence of f1, fo,... converges (by Exercise 14 in Section 6A). 
Dropping to a subsequence (but not relabeling) and setting fp = 0, we can assume 
that 


Li llfe — fe-al] < 0. 
k=1 


Hence )°?-_, (fk — fr—1) converges. The partial sum of this series after n terms is fy. 
Thus limy 500 fn exists, completing the proof. 
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Bounded Linear Maps 


When dealing with two or more vector spaces, as in the definition below, assume that 
the vector spaces are over the same field (either R or C, but denoted in this book as F 
to give us the flexibility to consider both cases). 

The notation Tf, in addition to the standard functional notation T( f ), is often 
used when considering linear maps, which we now define. 


6.42 Definition linear map 


Suppose V and W are vector spaces. A function T: V — W is called linear if 


oti e— 1h Perorall (ec Vv. 
e T(af) =aTf for alla € Fand f € V. 


A linear function is often called a linear map. 


The set of linear maps from a vector space V to a vector space W is itself a vector 
space, using the usual operations of addition and scalar multiplication of functions. 
Most attention in analysis focuses on the subset of bounded linear functions, defined 
below, which we will see is itself a normed vector space. 

In the next definition, we have two normed vector spaces, V and W, which may 
have different norms. However, we use the same notation ||-|| for both norms (and 
for the norm of a linear map from V to W) because the context makes the meaning 
clear. For example, in the definition below, f is in V and thus || || refers to the norm 
in V. Similarly, Tf € W and thus || Tf || refers to the norm in W. 


6.43 Definition bounded linear map; ||T\|; B(V,W) 


Suppose V and W are normed vector spaces and T: V — W isa linear map. 


e The norm of T, denoted ||T||, is defined by 


|T|| = sup{I|Tfll : f € V and || f|| < 1}. 


e T is called bounded if ||T|| < 9. 


e The set of bounded linear maps from V to W is denoted B(V, W). 


6.44 Example bounded linear map 


Let C([0,3]) be the normed vector space of continuous functions from [0,3] to F, 
with || f|| = sup|f|. Define T: C({0,3]) — C([0,3]) by 


. (TF)(x) = x?f (x). 


Then T is a bounded linear map and ||T|| = 9, as you should verify. 
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6.45 Example linear map that is not bounded 


Let V be the normed vector space of sequences (@1,a,...) of elements of F such 
that a, = 0 for all but finitely many k € Z*, with || (a1, 42,...)||oo = maxpez+|ax|. 
Define T: V > V by 


T (a4, a7, A3,.. ) = (a4, 2a, 343, xa ee 
Then T is a linear map that is not bounded, as you should verify. 


The next result shows that if V and W are normed vector spaces, then B(V, W) is 
a normed vector space with the norm defined above. 


6.46 ||-|| isa norm on B(V,W) 


Suppose V and W are normed vector spaces. Then ||S + T|| < ||S|| + ||T|| 


and ||aT|| = |a| ||T|| for all S$, T € B(V,W) and all a € F. Furthermore, the 
function ||-|| is a norm on B(V, 


Proof Suppose S,T € B(V,W). then 


|S + TI] = sup{l(S +T) fll: f € V and |[fl| < 1} 
< sup{||Sfl] + ITFIl : f © V and |[f]| < Tf 
< sup{||Sf|| : f € V and | fl] < 1} 
+sup{||Tfl|: f € Vand |IF || < 1f 
= [ISI + ITI 
The inequality above shows that ||-|| satisfies the triangle inequality on B(V,W). 


The verification of the other properties required for a normed vector space is left to 
the reader. 


Be sure that you are comfortable using all four equivalent formulas for ||T || shown 
in Exercise 16. For example, you should often think of ||T|| as the smallest number 
such that ||Tf|| < ||T|] || f|| for all f in the domain of T. 

Note that in the next result, the hypothesis requires W to be a Banach space but 
there is no requirement for V to be a Banach space. 


6.47 B(V,W) isa Banach space if W is a Banach space 


Suppose V is a normed vector space and W is a Banach space. Then 6(V, W) is 
a Banach space. 


Proof Suppose T;,T>,... is a Cauchy sequence in B(V,W). If f € V, then 


Tf — Tefll S IT; — Tell IAA. 


which implies that T; f, To f,... is a Cauchy sequence in W. Because W is a Banach 
space, this implies that T, f, T)f,... has a limit in W, which we call Tf. 
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We have now defined a function T: V — W. The reader should verify that T is a 
linear map. Clearly 


ITFll < sup{llTefll -k € Z7} 
< (sup{|Tel| : & € Z"}) If 


for each f € V. The last supremum above is finite because every Cauchy sequence is 
bounded (see Exercise 4). Thus T € B(V,W). 

We still need to show that limy_,.||T; — T|| = 0. To do this, suppose e > 0. Let 
n € Z* be such that ||T; — T;|| < ¢ for all j > n andk > n. Suppose j > n and 
suppose f € V. Then 


(Tj —T) fll = lim |ITif — Tef | 
— 00 
< ellfll. 
Thus ||T; — T|| < , completing the proof. 


The next result shows that the phrase bounded linear map means the same as the 
phrase continuous linear map. 


6.48 continuity is equivalent to boundedness for linear maps 


A linear map from one normed vector space to another normed vector space is 
continuous if and only if it is bounded. 


Proof Suppose V and W are normed vector spaces and T: V — W is linear. 
First suppose T is not bounded. Thus there exists a sequence fy, fo,...in V such 
that || f,|| < 1 for each k € Z* and ||T f;,|| + 00 as k — co. Hence 


je a fie \. TH 
=0 and Toren) = arg Po 


where the nonconvergence to 0 holds because Tf, /||Tf;,|| has norm 1 for every 
k € Z*. The displayed line above implies that T is not continuous, completing the 
proof in one direction. 

To prove the other direction, now suppose T is bounded. Suppose f € V and 
fi, fz,...18 a sequence in V such that limo fy = f. Then 


Tf — TAI = ITO — AI 
S |ITI Ife — fl. 


Thus limy_... Tf, = Tf. Hence T is continuous, completing the proof in the other 
direction. 


fk 
lim ——— 
k-+e0 || T f,| 


Exercise 18 gives several additional equivalent conditions for a linear map to be 
continuous. 
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EXERCISES 6C 


10 


11 


12 


13 


Show that the map f ++ || f|| from a normed vector space V to F is continuous 
(where the norm on F is the usual absolute value). 


Prove that if V is a normed vector space, f € V, andr > 0, then 


B(f,r) = BUF). 


Show that the functions defined in the last two bullet points of Example 6.35 are 
not norms. 


Prove that each Cauchy sequence in a normed vector space is bounded (meaning 
that there is a real number that is greater than the norm of every element in the 
Cauchy sequence). 


Show that if n € Z*, then F” is a Banach space with both the norms used in the 
first bullet point of Example 6.34. 


Suppose X is a nonempty set and b(X) is the vector space of bounded functions 
from X to F. Prove that if ||-|| is defined on b(X) by || f|| = sup|f], then b(X) 
is a Banach space. x 


Show that ¢' with the norm defined by ||(a1,42,...)||oo = sup |a,| is not a 
Banach space. keZ* 


Show that ¢1 with the norm defined by || (a1,a2,..-)|]1 = L724 |a¢| is a Banach 
space. 


Show that the vector space C((0,1]) of continuous functions from [0,1] to F 
with the norm defined by || f|| = i | f| is not a Banach space. 


Suppose U is a subspace of a normed vector space V such that some open ball 
of V is contained in U. Prove that U = V. 


Prove that the only subsets of a normed vector space V that are both open and 
closed are @ and V. 


Suppose V is a normed vector space. Prove that the closure of each subspace of 
V is a subspace of V. 


Suppose U is a normed vector space. Let d be the metric on U defined by 
d(f,g) = \|f —g|| for f,g © U. Let V be the complete metric space con- 
structed in Exercise 16 in Section 6A. 


(a) Show that the set V is a vector space under natural operations of addition 
and scalar multiplication. 


(b) Show that there is a natural way to make V into a normed vector space and 
that with this norm, V is a Banach space. 


(c) Explain why (b) shows that every normed vector space is a subspace of 
some Banach space. 


14 


15 


16 


17 


18 
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Suppose U is a subspace of a normed vector space V. Suppose also that W is a 
Banach space and S: U — W isa bounded linear map. 


(a) Prove that there exists a unique continuous function T: U — W such that 
Tlu =S. 

(b) Prove that the function T in part (a) is a bounded linear map from Utow 
and ||T|| = ||5|. 


(c) Give an example to show that part (a) can fail if the assumption that W is 
a Banach space is replaced by the assumption that W is a normed vector 
space. 


For readers familiar with the quotient of a vector space and a subspace: Suppose 
V is a normed vector space and U is a subspace of V. Define ||-|] on V/U by 


If + Ul] =inf{ lf + sll: € Uf. 


(a) Prove that ||-|| is a norm on V/U if and only if U is a closed subspace of V. 
(b) Prove that if V is a Banach space and U is a closed subspace of V, then 


V/U (with the norm defined above) is a Banach space. 


wm 


(c) Prove that if U is a Banach space (with the norm it inherits from V) and 
V /U is a Banach space (with the norm defined above), then V is a Banach 
space. 


Suppose V and W are normed vector spaces with V £ {0} and T: V > Wis 
a linear map. 

(a) Show that ||T|| = sup{||Tf|| : f € V and || f|| < 1}. 

(b) Show that ||T|| = sup{||Tf|| : f € V and || f|| = 1}. 

(c) Show that ||T|| = inf{c € [0,00) : || Tf|| <c||f|| for all f e V}. 


(d) Show that ||T|| = sup{ TA :f © Vand f Fz of. 


Suppose U, V, and W are normed vector spaces and T: U + VandS: V > W 
are linear. Prove that ||S o T|| < ||S|j ||T]]. 


Suppose V and W are normed vector spaces and T: V — W is a linear map. 
Prove that the following are equivalent: 

(a) T is bounded. 

(b) There exists f € V such that T is continuous at f. 


(c) T is uniformly continuous (which means that for every ¢ > 0, there exists 
5 > Osuch that ||T f — Tg|| < efor all f,g € V with || f — g|| < 4). 


(d) T~+(B(0,r)) is an open subset of V for some r > 0. 
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6D Linear Functionals 


Bounded Linear Functionals 


Linear maps into the scalar field F are so important that they get a special name. 


A linear functional on a vector space V is a linear map from V to F. 


When we think of the scalar field F as a normed vector space, as in the next 
example, the norm ||z|| of a number z € F is always intended to be just the usual 
absolute value |z|. This norm makes F a Banach space. 


6.50 Example linear functional 


Let V be the vector space of sequences (@1,42,...) of elements of F such that 
ax = 0 for all but finitely many k € Z*. Define gp: V — F by 


p(a1,42,.-.) = )> ag. 
k=1 


Then @ is a linear functional on V. so 


e If we make V a normed vector space with the norm || (a1, 42,...)||1 = )— |ael. 
then ¢ is a bounded linear functional on V, as you should verify. k=1 


e If we make V a normed vector space with the norm ||(a1,4@2,...)||co = max|a,|, 
. . . . ol 
then g is not a bounded linear functional on V, as you should verify. 


Definition 


Suppose V and W are vector spaces and T: V + W is a linear map. Then the 


null space of T is denoted by null T and is defined by 
null i av er 0} 


If T is a linear map on a vector space 
V, then null T is a subspace of V, as you 
should verify. If T is a continuous linear 
map from a normed vector space V to a 
normed vector space W, then null T is a 
closed subspace of V because null T = 
T~!({O}) and the inverse image of the 
closed set {0} is closed [by 6.11(d)]. 

The converse of the last sentence fails, because a linear map between normed 
vector spaces can have a closed null space but not be continuous. For example, the 
linear map in 6.45 has a closed null space (equal to {0}) but it is not continuous. 

However, the next result states that for linear functionals, as opposed to more 
general linear maps, having a closed null space is equivalent to continuity. 


The term kernel is also used in the 
mathematics literature with the 
same meaning as null space. This 


book uses null space instead of 
kernel because null space better 
captures the connection with 0. 
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6.52 bounded linear functionals 


Suppose V is a normed vector space and g: V — Fis a linear functional that is 
not identically 0. Then the following are equivalent: 


(a) gis a bounded linear functional. 


(b) gis acontinuous linear functional. 
(c) null @ is aclosed subspace of V. 


(d) nullg 4 V. 


Proof The equivalence of (a) and (b) is just a special case of 6.48. 

To prove that (b) implies (c), suppose @ is a continuous linear functional. Then 
null y, which is the inverse image of the closed set {0}, is a closed subset of V by 
6.11(d). Thus (b) implies (c). 

To prove that (c) implies (a), we will show that the negation of (a) implies the 
negation of (c). Thus suppose ¢@ is not bounded. Thus there is a sequence fj, f2,... 
in V such that || f;|| < 1 and |p(f,.)| > k for each k € Zt. Now 


fi tk 


e(fi) (fr) € null g This proof makes major use of 
dividing by expressions of the form 
+ 
for each k € Z* and @(f), which would not make sense 
fen ( fi fk ) =. a for a linear mapping into a vector 
k30\Q(fi) ofr) p(fi) space other than F. 
Clearly 


fi ) = fi 
o( or 1 and thus an) ¢ null 9. 
The last three displayed items imply that null g is not closed, completing the proof 
that the negation of (a) implies the negation of (c). Thus (c) implies (a). 

We now know that (a), (b), and (c) are equivalent to each other. 

Using the hypothesis that @ is not identically 0, we see that (c) implies (d). To 
complete the proof, we need only show that (d) implies (c), which we will do by 
showing that the negation of (c) implies the negation of (d). Thus suppose null 9 is 
not a closed subspace of V. Because null ¢ is a subspace of V, we know that null g 
is also a subspace of V (see Exercise 12 in Section 6C). Let f € null g \ nullg. 
Suppose g € V. Then 


The term in large parentheses above is in null g and hence is in null g. The term 
above following the plus sign is a scalar multiple of f and thus is in null g. Because 
the equation above writes g as the sum of two elements of null g, we conclude that 
g € null. Hence we have shown that V = null g, completing the proof that the 
negation of (c) implies the negation of (d). 
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Discontinuous Linear Functionals 


The second bullet point in Example 6.50 shows that there exists a discontinuous linear 
functional on a certain normed vector space. Our next major goal is to show that every 
infinite-dimensional normed vector space has a discontinuous linear functional (see 
6.62). Thus infinite-dimensional normed vector spaces behave in this respect much 
differently from F”, where all linear functionals are continuous (see Exercise 4). 
We need to extend the notion of a basis of a finite-dimensional vector space to an 
infinite-dimensional context. In a finite-dimensional vector space, we might consider 
a basis of the form e1,...,@n, where n € Z* and each e; is an element of our vector 
space. We can think of the list e1,...,@, as a function from {1, sky n} to our vector 
space, with the value of this function atk € {1,...,m} denoted by eg with a subscript 
k instead of by the usual functional notation e(k). To generalize, in the next definition 
we allow {1,...,7} to be replaced by an arbitrary set that might not be a finite set. 


6.53 Definition family 


A family {e,},er in a set V is a function e from a set I to V, with the value of 
the function e at k € T denoted by ex. 


Even though a family in V is a function mapping into V and thus is not a subset 
of V, the set terminology and the bracket notation {e, },cr are useful, and the range 
of a family in V really is a subset of V. 

We now restate some basic linear algebra concepts, but in the context of vector 
spaces that might be infinite-dimensional. Note that only finite sums appear in the 
definition below, even though we might be working with an infinite family. 


6.54 Definition linearly independent; span; basis 


Suppose {e;,},er is a family in a vector space V. 


e {ex}¢er is called linearly independent if there does not exist a finite 
nonempty subset O of I and a family {aj}jeq in F \ {0} such that 


Lica Hie; =); 


The span of {e,}xer is denoted by span{e,},<r and is defined to be the set 
of all sums of the form 
Lae). 


jeO 


where € is a finite subset of T and {«;}jeq is a family in F. 


A vector space V is called finite-dimensional if there exists a finite set T and 
a family {e,},cr in V such that span{e,},er = V. 


A vector space is called infinite-dimensional if it is not finite-dimensional. 


A family in V is called a basis of V if it is linearly independent and its span 
equals V. 
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For example, {x”} n<{0,1,2,...} i8 a ba- 
sis of the vector space of polynomials. 

Our definition of span does not take 
advantage of the possibility of summing 
an infinite number of elements in contexts 
where a notion of limit exists (as is the 
case in normed vector spaces). When we get to Hilbert spaces in Chapter 8, we 
consider another kind of basis that does involve infinite sums. As we will soon see, 
the kind of basis as defined here is just what we need to produce discontinuous linear 
functionals. 

Now we introduce terminology that 
will be needed in our proof that every vec- 
tor space has a basis. 


The term Hamel basis is sometimes 
used to denote what has been called 
a basis here. The use of the term 


Hamel basis emphasizes that only 
finite sums are under consideration. 


No one has ever produced a 
concrete example of a basis of an 
infinite-dimensional Banach space. 


6.55 Definition maximal element 


Suppose A is a collection of subsets of a set V. A set € A is called a maximal 
element of A if there does not exist I’ € A such that GT’. 


6.56 Example maximal elements 


Fork € Z, let kZ denote the set of integer multiples of k; thus kZ = {km :m € Z}. 
Let A be the collection of subsets of Z defined by A = {kZ : k = 2,3,4,...}. 
Suppose k € Zt. Then kZ is a maximal element of A if and only if k is a prime 
number, as you should verify. 


A subset [ of a vector space V can be thought of as a family in V by considering 
{er} fer, where ey = f. With this convention, the next result shows that the bases of 
V are exactly the maximal elements among the collection of linearly independent 
subsets of V. 


6.57 bases as maximal elements 


Suppose V is a vector space. Then a subset of V is a basis of V if and only if it is 
a maximal element of the collection of linearly independent subsets of V. 


Proof Suppose I is a linearly independent subset of V. 

First suppose also that I is a basis of V. If f € V but f ¢ T, then f € spanT, 
which implies that I U { f} is not linearly independent. Thus I is a maximal element 
among the collection of linearly independent subsets of V, completing one direction 
of the proof. 

To prove the other direction, suppose now that I is a maximal element of the 
collection of linearly independent subsets of V. If f € V but f ¢ spanT, then 
TU {f} is linearly independent, which would contradict the maximality of [ among 
the collection of linearly independent subsets of V. Thus spanT’ = V, which means 
that [ is a basis of V, completing the proof in the other direction. 
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The notion of a chain plays a key role in our next result. 


6.58 Definition chain 


A collection C of subsets of a set V is called a chain if O,T € C implies OQ Cc T 
orc. 


6.59 Example chains 


e The collection C = {4Z,6Z} of subsets of Z is not a chain because neither of 
the sets 4Z or 6Z is a subset of the other. 


e The collection C = {2"Z : n © Z*} of subsets of Z is a chain because if 
m,n € Zt, then2”"Z C 2"Z or 2"Z C 2"Z. 


The next result follows from the Ax- 
iom of Choice, although it is not as intu- 
itively believable as the Axiom of Choice. 
Because the techniques used to prove the 
next result are so different from tech- 
niques used elsewhere in this book, the 
reader is asked either to accept this result without proof or find one of the good proofs 
available via the internet or in other books. The version of Zorn’s Lemma stated here 
is simpler than the standard more general version, but this version is all that we need. 


6.60 Zorn’s Lemma 


Suppose V is a set and A is a collection of subsets of V with the property that 
the union of all the sets in C is in A for every chain C C A. Then A contains a 
maximal element. 


Zorn’s Lemma is named in honor of 
Max Zorn (1906-1993), who 
published a paper containing the 
result in 1935, when he had a 
postdoctoral position at Yale. 


Zorn’s Lemma now allows us to prove that every vector space has a basis. The 
proof does not help us find a concrete basis because Zorn’s Lemma is an existence 
result rather than a constructive technique. 


6.61 bases exist 


Every vector space has a basis. 


Proof Suppose V is a vector space. If C is a chain of linearly independent subsets 
of V, then the union of all the sets in C is also a linearly independent subset of V (this 
holds because linear independence is a condition that is checked by considering finite 
subsets, and each finite subset of the union is contained in one of the elements of the 
chain). 

Thus if A denotes the collection of linearly independent subsets of V, then A 
satisfies the hypothesis of Zorn’s Lemma (6.60). Hence A contains a maximal 
element, which by 6.57 is a basis of V. 
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Now we can prove the promised result about the existence of discontinuous linear 
functionals on every infinite-dimensional normed vector space. 


6.62 discontinuous linear functionals 


Every infinite-dimensional normed vector space has a discontinuous linear 
functional. 


Proof Suppose V is an infinite-dimensional vector space. By 6.61, V has a basis 
{ex}ker. Because V is infinite-dimensional, T is not a finite set. Thus we can assume 
Z* CT (by relabeling a countable subset of I’). 

Define a linear functional p: V — F by setting g(e;) equal to j||e;|| for j € Z*, 
setting p(e;) equal to 0 for j € T'\ Z*, and extending linearly. More precisely, define 
a linear functional gp: V — F by 


(dae) = DO wilesl 
GEO. jEQNZ+ 
for every finite subset © C I and every family {a;}jeq in F. 


Because g(e;) = j|le;|| for each j € Z*, the linear functional g is unbounded, 
completing the proof. 


Hahn—Banach Theorem 


In the last subsection, we showed that there exists a discontinuous linear functional 
on each infinite-dimensional normed vector space. Now we turn our attention to the 
existence of continuous linear functionals. 

The existence of a nonzero continuous linear functional on each Banach space is 
not obvious. For example, consider the Banach space 2° /cg, where £° is the Banach 
space of bounded sequences in F with 


| (41,42, oe .) Ilo = sup |x| 
keZt 
and Co is the subspace of £° consisting of those sequences in F that have limit 0. The 
quotient space £°/cg is an infinite-dimensional Banach space (see Exercise 15 in 
Section 6C). However, no one has ever exhibited a concrete nonzero linear functional 
on the Banach space °/cg. 

In this subsection, we show that infinite-dimensional normed vector spaces have 
plenty of continuous linear functionals. We do this by showing that a bounded linear 
functional on a subspace of a normed vector space can be extended to a bounded 
linear functional on the whole space without increasing its norm—this result is called 
the Hahn—Banach Theorem (6.69). 

Completeness plays no role in this topic. Thus this subsection deals with normed 
vector spaces instead of Banach spaces. 

We do most of the work needed to prove the Hahn—Banach Theorem in the next 
lemma, which shows that we can extend a linear functional to a subspace generated 
by one additional element, without increasing the norm. This one-element-at-a-time 
approach, when combined with a maximal object produced by Zorn’s Lemma, gives 
us the desired extension to the full normed vector space. 
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If V is areal vector space, U is a subspace of V, and h € V, then U + Rh is the 
subspace of V defined by 


U+Rh={f+ah: f € Uanda ec R}. 


6.63 Extension Lemma 


Suppose V is a real normed vector space, U is a subspace of V, and p: U > R 


is a bounded linear functional. Suppose h € V \ U. Then p can be extended to a 
bounded linear functional g: U + Rh > R such that ||¢|| = ||y]]. 


Proof Suppose c € R. Define y(h) to be c, and then extend ¢ linearly to U + Rh. 
Specifically, define gp: U+ Rh > R by 


pf + ah) = p(f) +ac 


for f € Uand« € R. Then @ is a linear functional on U + RA. 

Clearly g|y = ~. Thus ||g|| > |||]. We need to show that for some choice of 
c € R, the linear functional ¢ defined above satisfies the equation ||g|| = |||]. In 
other words, we want 


6.64 lp(f) +ac] < ||| [f+ ah|| forall f € Uandalla ER. 
It would be enough to have 


6.65 WA) +el S llpll f+ All for all f € U, 


because replacing f by f in the last inequality and then multiplying both sides by |a| 
would give 6.64. 
Rewriting 6.65, we want to show that there exists c € R such that 


WF +All SoA) +e S PllF +All forall f € U. 


Equivalently, we want to show that there exists c € R such that 


OF +All — 9) ses [lp IF +All — ef) forall f € u. 


The existence of c € R satisfying the line above follows from the inequality 


6.66 = sup(—||$I| If +#Il — (A) S inf (Yl lis +All — ¥(g)). 
feu 8¢ 


To prove the inequality above, suppose f,g € U. Then 


NPA +2 -— oF) < Wells + All — Ils — FID — oA) 
= IPs +All — ls - FID + oe — f) — #(@) 
< |I#ll ls + 4ll — ¥(g). 


The inequality above proves 6.66, which completes the proof. 
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Because our simplified form of Zorn’s Lemma deals with set inclusions rather 
than more general orderings, we need to use the notion of the graph of a function. 


6.67 Definition graph 


Suppose T: V — W is a function from a set V to a set W. Then the graph of T 
is denoted graph(T) and is the subset of V x W defined by 


graph(T) = {(f,T(f)) @Vx W: f € Vi. 


Formally, a function from a set V to a set W equals its graph as defined above. 
However, because we usually think of a function more intuitively as a mapping, the 
separate notion of the graph of a function remains useful. 

The easy proof of the next result is left to the reader. The first bullet point 
below uses the vector space structure of V x W, which is a vector space with natural 
operations of addition and scalar multiplication, as given in Exercise 10 in Section 6B. 


6.68 function properties in terms of graphs 


Suppose V and W are normed vector spaces and T: V — W isa function. 


(a) T is a linear map if and only if graph(T) is a subspace of V x W. 


(b) Suppose U Cc V and S: U — W isa function. Then T is an extension of S 
if and only if graph(S) C graph(T). 


(c) If T: V > Wisa linear map and c € [0,00), then ||T|| < c if and only if 
lIzIl < ¢llfll for all (f, g) € graph(T). 


The proof of the Extension Lemma 
(6.63) used inequalities that do not make 
sense when F = C. Thus the proof of the 
Hahn—Banach Theorem below requires 
some extra steps when F = C. 


Hans Hahn (1879-1934) was a 
student and later a faculty member 
at the University of Vienna, where 
one of his PhD students was Kurt 
Gédel (1906-1978). 


6.69 Hahn—Banach Theorem 


Suppose V is a normed vector space, U is a subspace of V, and p: U > Fisa 


bounded linear functional. Then i can be extended to a bounded linear functional 
on V whose norm equals || 11]. 


Proof First we consider the case where F = R. Let A be the collection of subsets 
E of V x R that satisfy all the following conditions: 


e E = graph(¢) for some linear functional g on some subspace of V; 
e graph() C E; 
© |a| < ||pl| [If] for every (f, a) € E. 
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Then A satisfies the hypothesis of Zorn’s Lemma (6.60). Thus A has a maximal 
element. The Extension Lemma (6.63) implies that this maximal element is the graph 
of a linear functional defined on all of V. This linear functional is an extension of p 

=R. 


Now consider the case where F = C. Define p;: U — R by 


pi(f) = Rep(f) 
for f € U. Then 7 is an R-linear map from U to R and ||#|| < ||y|| (actually 
IP] = 


b(f) = Rep(f) +iIm o(f) 
= $1(f) + ilm(—ip Gf) 
= Wi(f) —iRe(p(if)) 
6.70 = pi(f) — ifn (if) 
for all f € U. 
Temporarily forget that complex scalar multiplication makes sense on V and 
temporarily think of V as a real normed vector space. The case of the result that 
we have already proved then implies that there exists an extension @ of y to an 


R-linear functional |: V + R with ||¢1|| = ||y1|| < ||P]. 
Motivated by 6.70, we define gp: V — C by 


9(f) = pilf) —ipilif) 
for f € V. The equation above and 6.70 imply that @ is an extension of tp to V. The 
equation above also implies that p(f + ¢) = p(f) + 9(g) and p(af) = ag(f) for 
all f,g € V andalla € R. Also, 
pif) = prlif) — ipi(—f) = erlif) + igi(f) = i(gi(f) — ipilif)) = i@(f). 
The reader should use the equation above to show that @ is a C-linear map. 


The only part of the proof that remains is to show that ||@||_< |||]. To do this, 
note that 


le AP = e(PAS) = eA) S Mol leGFll = lvl eI IA 


for all f € V, where the second equality holds because (9(f). bara ) € R. Dividing by 
|p(f)|, we see from the line above that |p(f)| < |||] || f|| for all f € V (no division 
necessary if p(f) = 0). This implies that ||g|| < |||], completing the proof. 


We have given the special name linear functionals to linear maps into the scalar 
field F. The vector space of bounded linear functionals now also gets a special name 
and a special notation. 


6.71 Definition dual space 


Suppose V is a normed vector space. Then the dual space of V, denoted V’, is the 


normed vector space consisting of the bounded linear functionals on V. In other 
words, V’ = B(V,F). 


By 6.47, the dual space of every normed vector space is a Banach space. 
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6.72 ||f|| = max{|p(f)|: 9 € V’ and ||9|| = 1} 


Suppose V is a normed vector space and f € V \ {0}. Then there exists p € V’ 
such that || p|] = 1 and ||f|| = p(f). 


Proof Let U be the 1-dimensional subspace of V defined by 
U = {af:a€F}. 
Define ~: U — F by 
p(af) = all fll 
for a € F. Then j is a linear functional on U with ||p|| = 1 and p(f) = ||f||. The 


Hahn-Banach Theorem (6.69) implies that there exists an extension of w to a linear 
functional g on V with || g|| = 1, completing the proof. 


The next result gives another beautiful application of the Hahn—Banach Theorem, 
with a useful necessary and sufficient condition for an element of a normed vector 
space to be in the closure of a subspace. 


6.73 condition to be in the closure of a subspace 


Suppose U is a subspace of a normed vector space V andh € V. Thenh € U if 
and only if p(h) = 0 for every y € V’ such that g|y = 0. 


Proof First suppose h € U. If g € V’ and g|y = 0, then g(h) = 0 by the 
continuity of g, completing the proof in one direction. _ 
To prove the other direction, suppose now that h ¢ U. Define ~: U + Fh > F by 


p(f + ah) =a 

for f € Uand« € F. Then j isa linear functional on U + Fh with null » = U and 
pli) =. 

Because h ¢ U, the closure of the null space of y does not equal U + Fh. Thus 
6.52 implies that q is a bounded linear functional on U + Fh. 

The Hahn—Banach Theorem (6.69) implies that can be extended to a bounded 
linear functional g on V. Thus we have found g € V’ such that g|y = 0 but 
g(h) # 0, completing the proof in the other direction. 


EXERCISES 6D 


1 Suppose V is a normed vector space and ¢ is a linear functional on V. Suppose 
a € F \ {0}. Prove that the following are equivalent: 


(a) gis a bounded linear functional. 
(b) g(a) is a closed subset of V. 


(c) p l(a) #V. 
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Suppose ¢g is a linear functional on a vector space V. Prove that if U is a 
subspace of V such that null g C U, then U = null g or U = V. 
Suppose ¢ and j are linear functionals on the same vector space. Prove that 
nullg C nully 


if and only if there exists w € F such that p = w@q. 


For the next two exercises, F" should be endowed with the norm ||-||.o as defined 
in Example 6.34. 


4 


10 


11 


12 


13 


Suppose 1 € Z* and V is anormed vector space. Prove that every linear map 
from F” to V is continuous. 


Suppose n € Z*, V is anormed vector space, and T: F” > V is a linear map 
that is one-to-one and onto V. 


(a) Show that 
inf{||Tx|| : x € F” and ||x||.. = 1} > 0. 


(b) Prove that T~!: V — F” is a bounded linear map. 


Suppose n € Zt. 


(a) Prove that all norms on F” have the same convergent sequences, the same 
open sets, and the same closed sets. 


(b) Prove that all norms on F” make F” into a Banach space. 


Suppose V and W are normed vector spaces and V is finite-dimensional. Prove 
that every linear map from V to W is continuous. 


Prove that every finite-dimensional normed vector space is a Banach space. 


Prove that every finite-dimensional subspace of each normed vector space is 
closed. 


Give a concrete example of an infinite-dimensional normed vector space and a 
basis of that normed vector space. 


Show that the collection A = {kZ : k = 2,3,4,...} of subsets of Z satisfies 
the hypothesis of Zorn’s Lemma (6.60). 


Prove that every linearly independent family in a vector space can be extended 
to a basis of the vector space. 


Suppose V is a normed vector space, U is a subspace of V, and w: U — Risa 
bounded linear functional. Prove that i) has a unique extension to a bounded 
linear functional g on V with ||@|| = |||] if and only if 


sup(—llvll If +All — of) (vil lig +All — ¥(s)) 


= inf 
gcu 


for every h € V \ U. 


14 


15 


16 
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Show that there exists a linear functional g: €° —> F such that 


|p(a1,42,..-)| < ||(41,42,---) leo 


for all (a1,a2,...) € &° and 


9(a1,a2,...) = lim ag 
k- co 


for all (a1,a2,...) € €° such that the limit above on the right exists. 


Suppose B is an open ball in a normed vector space V such that 0 ¢ B. Prove 
that there exists p € V’ such that 


Re g(f) >0 


for all f € B. 


Show that the dual space of each infinite-dimensional normed vector space is 
infinite-dimensional. 


A normed vector space is called separable if it has a countable subset whose clo- 
sure equals the whole space. 


17 
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Suppose V is a separable normed vector space. Explain how the Hahn—Banach 
Theorem (6.69) for V can be proved without using any results (such as Zorn’s 
Lemma) that depend upon the Axiom of Choice. 


Suppose V is a normed vector space such that the dual space V’ is a separable 
Banach space. Prove that V is separable. 


Prove that the dual of the Banach space C([0,1]) is not separable; here the norm 
on C({0,1]) is defined by || f|| = sup] f}. 
1 


, 


The double dual space of a normed vector space is defined to be the dual space of 
the dual space. If V is a normed vector space, then the double dual space of V is 
denoted by V"; thus V" = (V’)’. The norm on V"" is defined to be the norm it 
receives as the dual space of V’. 


20 


21 


Define ®: V — V” by 
(®f)(9) = ff) 


for f € Vand g € V’. Show that ||®f|| = || f|| for every f € V. 
[The map ® defined above is called the canonical isometry of V into V".] 


Suppose V is an infinite-dimensional normed vector space. Show that there is a 
convex subset U of V such that U = V and such that the complement V \ U is 
also a convex subset of V with V\u = VV. 

[See 8.25 for the definition of a convex set. This exercise should stretch your 
geometric intuition because this behavior cannot happen in finite dimensions. ] 
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6E Consequences of Baire’s Theorem 


This section focuses on several important 
results about Banach spaces that depend 
upon Baire’s Theorem. This result was 
first proved by René-Louis Baire (1874— 
1932) as part of his 1899 doctoral disserta- 
tion at Ecole Normale Supérieure (Paris). 

Even though our interest lies primar- 
ily in applications to Banach spaces, the 
proper setting for Baire’s Theorem is the 
more general context of complete metric 
spaces. 


The result here called Baire’s 
Theorem is often called the Baire 
Category Theorem. This book uses 
the shorter name of this result 
because we do not need the 
categories introduced by Baire. 


Furthermore, the use of the word 
category in this context can be 
confusing because Baire’s 
categories have no connection with 
the category theory that developed 
decades after Baire’s work. 


Baire’s Theorem 


We begin with some key topological notions. 


Suppose U is a subset of a metric space V. The interior of U, denoted int U, is 
the set of f € U such that some open ball of V centered at f with positive radius 
is contained in U. 


You should verify the following elementary facts about the interior. 


e The interior of each subset of a metric space is open. 


e The interior of a subset U of a metric space V is the largest open subset of V 
contained in U. 


A subset U of a metric space V is called dense in V if U = V. 


For example, Q and R \ Q are both dense in R, where R has its standard metric 
d(x,y) = |x—yl. 


You should verify the following elementary facts about dense subsets. 


e A subset U of a metric space V is dense in V if and only if every nonempty open 
subset of V contains at least one element of U. 


e A subset U of a metric space V has an empty interior if and only if V \ U is 
dense in V. 


The proof of the next result uses the following fact, which you should first prove: 
IfG is an open subset of a metric space V and f € G, then there exists r > 0 such 
that B(f,r) CG. 
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6.76 Baire’s Theorem 


(a) A complete metric space is not the countable union of closed subsets with 
empty interior. 


(b) The countable intersection of dense open subsets of a complete metric space 
is nonempty. 


Proof We will prove (b) and then use (b) to prove (a). 

To prove (b), suppose (V,d) is a complete metric space and Gj, G,... is a 
sequence of dense open subsets of V. We need to show that (\¢2_, G, 4 @. 

Let fj € Gy and let ry € (0,1) be such that B(fi,r1) C G1. Now suppose 
n € Z*, and fi,---,fn and 11,...,1» have been chosen such that 


6.77 B(ft,11) > B(f2,r2) D-++ D BU fun) 
and 
6.78 r€(0,7) and B(fi,rj)CG forj=1,...,n 


Because B(fy,,71) is an open subset of V and G,,,1 is dense in V, there exists 
fnat © BU fas tn) O Gast. Let ty € (0, a) be such that 


B(fn41/Tn41) Cc B(fn, Tn) 1 Gn41- 


Thus we inductively construct a sequence f;, f2,... that satisfies 6.77 and 6.78 for 
alln € Zt. 
If j € Z*, then 6.77 and 6.78 imply that 


6.79 fe B(fi.rj) and (fife) <1j< 7 forallk > j. 


Hence f1, f2,... is a Cauchy sequence. Because (V,d) is a complete metric space, 
there exists f € V such that limy00 fr = 

Now 6.79 and 6.78 imply that for each j € Zt, we have f € B(fj,7;) C G;. 
Hence f € (\¢2_, Gx, which means that (\°_, G; is not the empty set, sonupleuie the 
proof of (b). 

To prove (a), suppose (V,d) is a complete metric space and F,, Fo,... is a se- 
quence of closed subsets of V with empty interior. Then V \ F,,V \ Fo,... isa 
sequence of dense open subsets of V. Now (b) implies that 


O# 


_)8 


(V \ &). 


k=1 


Taking complements of both sides above, we conclude that 
VAUR 
k=1 


completing the proof of (a). 
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Because 


R= Ut) 


xER 


and each set {x} has empty interior in R, Baire’s Theorem implies R is uncountable. 
Thus we have yet another proof that R is uncountable, different than Cantor’s original 
diagonal proof and different from the proof via measure theory (see 2.17). 

The next result is another nice consequence of Baire’s Theorem. 


6.80 the set of irrational numbers is not a countable union of closed sets 


There does not exist a countable collection of closed subsets of R whose union 
equals R \ Q. 


Proof This will be a proof by contradiction. Suppose Fj, F5,... is a countable 
collection of closed subsets of R whose union equals R \ Q. Thus each F; contains 
no rational numbers, which implies that each F, has empty interior. Now 


n= (Uir)u (Ux), 


The equation above writes the complete metric space R as a countable union of 
closed sets with empty interior, which contradicts Baire’s Theorem [6.76(a)]. This 
contradiction completes the proof. 


Open Mapping Theorem and Inverse Mapping Theorem 


The next result shows that a surjective bounded linear map from one Banach space 
onto another Banach space maps open sets to open sets. As shown in Exercises 10 
and 11, this result can fail if the hypothesis that both spaces are Banach spaces is 
weakened to allow either of the spaces to be a normed vector space. 


6.81 Open Mapping Theorem 


Suppose V and W are Banach spaces and T is a bounded linear map of V onto W. 
Then T(G) is an open subset of W for every open subset G of V. 


Proof Let B denote the open unit ball B(0,1) = {f € V: ||f|| < 1} of V. For any 
open ball B(f,a) in V, the linearity of T implies that 


T(B(f,a)) = Tf +aT(B). 


Suppose G is an open subset of V. If f € G, then there exists a > 0 such that 
B(f,a) C G. If we can show that 0 € int T(B), then the equation above shows that 
Tf €intT(B(f,a)). This would imply that T(G) is an open subset of W. Thus to 
complete the proof we need only show that T(B) contains some open ball centered 
at 0. 
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The surjectivity and linearity of T imply that 


W= U T(kB) = U kT(B) 
k=1 = 
Thus W = Ug, kT(B). Baire’s Theorem [6.76(a)] now implies that kT (B) has a 
nonempty interior for some k € Z*. The linearity of T allows us to conclude that 
T(B) has a nonempty interior. 
Thus there exists g € B such that Tg € int T(B). Hence 


0 € int T(B— g) C int T(2B) = int2T(B). 


Thus there exists r > 0 such that B(0,2r) C 2T(B) [here B(0,2r) is the closed ball 
in W centered at 0 with radius 2r]. Hence B(0,r) C T(B). The definition of what it 
means to be in the closure of T(B) [see 6.7] now shows that 


h € Wand ||h|| <rande > 0 = > Sf € B such that ||h — Tf|| <e. 


'_h shows that 


For arbitrary h 4 0 in W, applying the result in the line above to Ta 


6.82 h€Wande>0 => 3f € [LB such that ||h — Tf || <e. 


Now suppose g € W and ||g|| < 1. Applying 6.82 with h = g and € = 5, we see 
that 
there exists f; € +B such that a —Tfill <4 


Now applying 6.82 with h = g — Tf; and e = ¢, we see that 
there exists f7 € +B such that i; —Tfhi-Tf|| < 4 
Applying 6.82 again, this time with h = g — Tf, — Tfp ande = - we see that 
there exists f € 4B such that |g — Tf; — Tfo — Tfsl| < x 


Continue in this pattern, constructing a sequence fy, fo,... in V. Let 


foe) 
f=DLfe 
k=1 
where the infinite sum converges in V because 


1 2 


Dill < Le = 53 


r 


here we are using 6.41 (this is the place in the proof where we use the hypothesis that 
V is a Banach space). The inequality displayed above shows that || || < 2. 
Because 


1 
lig -—TA —Tfo—--+—Tfall < 55 


and because T is a continuous linear map, we have g = Tf. 
We have now shown that B(0,1) C 2T(B). Thus $B(0,1) C T(B), completing 
the proof. 
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The next result provides the useful in- 
formation that if a bounded linear map 
from one Banach space to another Banach 
space has an algebraic inverse (meaning 
that the linear map is injective and surjec- 
tive), then the inverse mapping is automatically bounded. 


The Open Mapping Theorem was 
first proved by Banach and his 


colleague Juliusz Schauder 
(1899-1943) in 1929-1930. 


6.83 Bounded Inverse Theorem 


Suppose V and W are Banach spaces and T is a one-to-one bounded linear map 
from V onto W. Then T~! is a bounded linear map from W onto V. 


Proof The verification that T~! is a linear map from W to V is left to the reader. 
To prove that T~! is bounded, suppose G is an open subset of V. Then 


-1\—1 
(T™") (G)=T(G). 
By the Open Mapping Theorem (6.81), T(G) is an open subset of W. Thus the 
equation above shows that the inverse image under the function T~! of every open 
set is open. By the equivalence of parts (a) and (c) of 6.11, this implies that T~! is 
continuous. Thus T~! is a bounded linear map (by 6.48). 


The result above shows that completeness for normed vector spaces sometimes 
plays a role analogous to compactness for metric spaces (think of the theorem stating 
that a continuous one-to-one function from a compact metric space onto another 
compact metric space has an inverse that is also continuous). 


Closed Graph Theorem 


Suppose V and W are normed vector spaces. Then V x W is a vector space with 
the natural operations of addition and scalar multiplication as defined in Exercise 10 
in Section 6B. There are several natural norms on V x W that make V x W into a 
normed vector space; the choice used in the next result seems to be the easiest. The 
proof of the next result is left to the reader as an exercise. 


6.84 product of Banach spaces 


Suppose V and W are Banach spaces. Then V x W is a Banach space if given 
the norm defined by 


If, 3) || = max{|[ fll, llglls 


for f € V and g € W. With this norm, a sequence (f1,91), (f2,82),--- in 
V x W converges to (f,@) if and only if jim fk = f and jim Ge = &. 
—7 00 —?00 


The next result gives a terrific way to show that a linear map between Banach 
spaces is bounded. The proof is remarkably clean because the hard work has been 
done in the proof of the Open Mapping Theorem (which was used to prove the 
Bounded Inverse Theorem). 
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6.85 Closed Graph Theorem 


Suppose V and W are Banach spaces and T is a function from V to W. Then T 
is a bounded linear map if and only if graph(T) is a closed subspace of V x W. 


Proof First suppose T is a bounded linear map. Suppose (1, Tf), (fo, Tf2),.-- is 
a sequence in graph(T) converging to (f,¢) € V x W. Thus 

lim fe =f and lim Tf, =g. 

k-oo k+oo 
Because T is continuous, the first equation above implies that limy_... Tf, = Tf; 
when combined with the second equation above this implies that g = Tf. Thus 
(f,g) = (f, Tf) € graph(T), which implies that graph(T) is closed, completing 
the proof in one direction. 

To prove the other direction, now suppose graph(T) is a closed subspace of 

V x W. Thus graph(T) is a Banach space with the norm that it inherits from V x W 
[from 6.84 and 6.16(b)]. Consider the linear map S: graph(T) — V defined by 


S(f,Tf) =f. 


Then 
ISG, TF) || = IF Il < max{ FIL ITA = CG TAI 


for all f € V. Thus S is a bounded linear map from graph(T) onto V with ||S|| <1. 
Clearly S is injective. Thus the Bounded Inverse Theorem (6.83) implies that S~! is 
bounded. Because S~!: V —> graph(T) satisfies the equation S~'f = (f, Tf), we 
have 


ITF < max{l fl, ITA} 
=I, TA)I| 
= ||S-"f I 
< ||S“*IIIFI 


for all f € V. The inequality above implies that T is a bounded linear map with 
\|T|| < ||S~1||, completing the proof. 


Principle of Uniform Boundedness 


The next result states that a family of 
bounded linear maps on a Banach space 
that is pointwise bounded is bounded in 
norm (which means that it is uniformly 
bounded as a collection of maps on the 
unit ball). This result is sometimes called 
the Banach—Steinhaus Theorem. Exercise 
17 is also sometimes called the Banach— 
Steinhaus Theorem. 


The Principle of Uniform 
Boundedness was proved in 1927 by 
Banach and Hugo Steinhaus 
(1887-1972). Steinhaus recruited 


Banach to advanced mathematics 
after overhearing him discuss 
Lebesgue integration in a park. 
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6.86 Principle of Uniform Boundedness 


Suppose V is a Banach space, W is a normed vector space, and A is a family of 
bounded linear maps from V to W such that 


sup{||Tf|| : T € A} < oo for every f € V. 


sup{||T|| : T € A} < 0, 


Proof Our hypothesis implies that 


V= U{f eV: ||Tf|| <1 forall T € A}, 
i=. 
n 


where V,, is defined by the expression above. Because each T € A is continuous, V;, 
is a closed subset of V for each n € Z*. Thus Baire’s Theorem [6.76(a)] and the 
equation above imply that there exist m € Z* and h € Vand r > 0 such that 


6.87 B(h,r) C Vn. 


Now suppose g € V and ||g|| < 1. Thus rg +h € B(h,r). Hence if T € A, then 
6.87 implies ||T (rg + /)|| <n, which implies that 


Test H) Thy < MPs my THY n+ TH 
r ae | en nn ee 


FES 


Thus 
n+sup{||Th|| :T € A} das 


r 


sup{|T|| :T € A} < 


’ 


completing the proof. 


EXERCISES 6E 


1 Suppose U is a subset of a metric space V. Show that U is dense in V if and 
only if every nonempty open subset of V contains at least one element of U. 


2 Suppose U is a subset of a metric space V. Show that U has an empty interior if 
and only if V \ U is dense in V. 


3 Prove or give a counterexample: If V is a metric space and U, W are subsets 
of V, then (int U) U (intW) = int( UUW). 


4 Prove or give a counterexample: If V is a metric space and U, W are subsets 
of V, then (int U) M (intW) = int((UN W). 


5 


10 


11 
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Suppose 
x= {(0}UU EH 
k=1 
and d(x,y) = |x — y| for x,y € X. 


(a) Show that (X,d) is a complete metric space. 

(b) Each set of the form {x} for x € X is a closed subset of R that has an 
empty interior as a subset of R. Clearly X is a countable union of such sets. 
Explain why this does not violate the statement of Baire’s Theorem that 
a complete metric space is not the countable union of closed subsets with 
empty interior. 


Give an example of a metric space that is the countable union of closed subsets 
with empty interior. 

[This exercise shows that the completeness hypothesis in Baire’s Theorem cannot 
be dropped. | 


(a) Define f: R > Ras follows: 


if @ is irrational, 


f(a) = 


if a is rational and 7 is the smallest positive integer 
such that a = “ for some integer m. 


3|- © 


At which numbers in R is f continuous? 

(b) Show that there does not exist a countable collection of open subsets of R 
whose intersection equals Q. 

(c) Show that there does not exist a function f: R — R such that f is continu- 
ous at each element of Q and discontinuous at each element of R \ Q. 


Suppose (X,d) is a complete metric space and G1, G2,... is a sequence of 
dense open subsets of X. Prove that ()j2_, G, is a dense subset of X. 


Prove that there does not exist an infinite-dimensional Banach space with a 
countable basis. 

[This exercise implies, for example, that there is not a norm that makes the 
vector space of polynomials with coefficients in F into a Banach space. | 


Give an example of a Banach space V, a normed vector space W, a bounded 
linear map T of V onto W, and an open subset G of V such that T(G) is not an 
open subset of W. 

[This exercise shows that the hypothesis in the Open Mapping Theorem that 
W is a Banach space cannot be relaxed to the hypothesis that W is a normed 
vector space. | 


Show that there exists a normed vector space V, a Banach space W, a bounded 
linear map T of V onto W, and an open subset G of V such that T(G) is not an 
open subset of W. 

[This exercise shows that the hypothesis in the Open Mapping Theorem that V 
is a Banach space cannot be relaxed to the hypothesis that V is a normed vector 
space. | 
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A linear map T: V — W from a normed vector space V to a normed vector space 
W is called bounded below if there exists c € (0,00) such that || f || < c||Tf|| 
forall f € V. 


12 Suppose T: V — W is a bounded linear map from a Banach space V to a 
Banach space W. Prove that T is bounded below if and only if T is injective and 
the range of T is a closed subspace of W. 


13 Give an example of a Banach space V, a normed vector space W, and a one-to- 
one bounded linear map T of V onto W such that T~! is not a bounded linear 
map of W onto V. 
[This exercise shows that the hypothesis in the Bounded Inverse Theorem (6.83) 
that W is a Banach space cannot be relaxed to the hypothesis that W is a 
normed vector space. | 


14 Show that there exists a normed space V, a Banach space W, and a one-to-one 
bounded linear map T of V onto W such that T~! is not a bounded linear map 
of W onto V. 
[This exercise shows that the hypothesis in the Bounded Inverse Theorem (6.83) 
that V is a Banach space cannot be relaxed to the hypothesis that V is a normed 
vector space. | 


15 Prove 6.84. 


16 Suppose V is a Banach space with norm ||-|| and that g: V — F is a linear 
functional. Define another norm ||- || on V by 


IIflle = IF +1eQ*- 


Prove that if V is a Banach space with the norm ||-| 
linear functional on V (with the original norm). 


Q> then ~isa continuous 


17 Suppose V is a Banach space, W is a normed vector space, and T;,7T2,...is a 
sequence of bounded linear maps from V to W such that lim,_,. Ti, f exists for 
each f € V. Define T: V — W by 


Tf = lim Tef 
k-00 


for f € V. Prove that T is a bounded linear map from V to W. 
[This result states that the pointwise limit of a sequence of bounded linear maps 
on a Banach space is a bounded linear map. | 


18 Suppose V is a normed vector space and B is a subset of V such that 


sup|@(f)| < 0 
feB 
for every g € V’. Prove that sup]| f || < 00. 
fcB 


19 Suppose T: V — W isa linear map from a Banach space V to a Banach space 
W such that 
goT €V’' forall g € W’. 


Prove that T is a bounded linear map. 


Chapter 7 od 


Fix a measure space (X,S,}) and a positive number p. We begin this chapter by 
looking at the vector space of measurable functions f : X — F such that 


[If dn <0. 


Important results called Hélder’s inequality and Minkowski’s inequality help us 
investigate this vector space. A useful class of Banach spaces appears when we 
identify functions that differ only on a set of measure 0 and require p > 1. 
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helped explain Einstein’s special theory of relativity. 
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7A LP(u) 


Hodlder’s Inequality 


Our next major goal is to define an important class of vector spaces that generalize the 
vector spaces £1 (4) and ¢* introduced in the last two bullet points of Example 6.32. 
We begin this process with the definition below. The terminology p-norm introduced 
below is convenient, even though it is not necessarily a norm. 


7.1. Definition ||f|, 


Suppose that (X,S,}) is a measure space, 0 < p < o, and f: X —> Fis 
S-measurable. Then the p-norm of f is denoted by || f || p and is defined by 


fle = (ffl? ax)”. 


Also, || f||oo, which is called the essential supremum of f, is defined by 


i llet—iiniye =O fe Xa) a) wt 


The exponent 1/p appears in the definition of the p-norm || f ||» because we want 
the equation ||#f||, = |a| || f||p to hold for all w € F. 

For 0 < p < ©, the p-norm ||f ||» does not change if f changes on a set of 
p-measure 0. By using the essential supremum rather than the supremum in the defi- 
nition of || f||oo, we arrange for the co-norm || f||o. to enjoy this same property. Think 
of || f||oo as the smallest that you can make the supremum of | f| after modifications 
on sets of measure 0. 


7.2 Example p-norm for counting measure 


Suppose j/ is counting measure on Z*. If a = (a1, a2,...) is a sequence in F and 
0 < p < 0, then 


fore) 1/p 
lap =(Qelael”) °* and |Jallco = sup {lag| : k € Zp. 
k=1 


Note that for counting measure, the essential supremum and the supremum are the 
same because in this case there are no sets of measure 0 other than the empty set. 


Now we can define our generalization of L1(j:), which was defined in the second- 
to-last bullet point of Example 6.32. 


7.3 Definition L?(y) 


Suppose (X,S, 1) is a measure space and 0 < p < oo. The Lebesgue space 


LP(), sometimes denoted L? (X,S, 1), is defined to be the set of S-measurable 
functions f : X — F such that || f ||» < oo. 
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7.4 Example ¢? 


When yp is counting measure on Z*, the set L?(j1) is often denoted by €? (pro- 
nounced little el-p). Thus if 0 < p < ov, then 


éP = {(ay,dz,...) : each a, € Fand )°|ay|P < 00} 
k=1 
and 


€° = {(ay,a2,...) : each ag € Fand sup |a;| < oo}. 
keZt 


Inequality 7.5(a) below provides an easy proof that £L? (j1) is closed under addition. 
Soon we will prove Minkowski’s inequality (7.14), which provides an important 
improvement of 7.5(a) when p > 1 but is more complicated to prove. 


7.5 LP(u) is a vector space 


Suppose (X,S, j/) is a measure space and 0 < p < oo. Then 


If +sllp <2? (IIfllp + lisilp) 


laf lly = lal If llp 


for all f,g € £P?(p) and all w € F. Furthermore, with the usual operations of 
addition and scalar multiplication of functions, £?(j) is a vector space. 


Proof Suppose f,g € LP(p). If x € X, then 

f(x) + aC)? s (LFC)1 + IsQ@))? 
< (2maxt{|f(x)|,|g(*)|})? 
< 2P(lF(x)IP + g(x) |”). 


Integrating both sides of the inequality above with respect to p gives the desired 


inequality P : ’ 
If + ally S 2P (IF lp + IIsilp)- 


This inequality implies that if || f ||, < 00 and ||¢||p < 0, then || f + ||» < co. Thus 
LP (p) is closed under addition. 
The proof that 


llefllp = lel IF lly 


follows easily from the definition of ||-||». This equality implies that CL? (11) is closed 
under scalar multiplication. 

Because £?(}1) contains the constant function 0 and is closed under addition and 
scalar multiplication, L? (1) is a subspace of F* and thus is a vector space. 
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What we call the dual exponent in the definition below is often called the conjugate 
exponent or the conjugate index. However, the terminology dual exponent conveys 
more meaning because of results (7.25 and 7.26) that we will see in the next section. 


For 1 < p < ©, the dual exponent of p is denoted by p’ and is the element of 


[1, co] such that 
1 #1 
pP oP 


7.7 Example dual exponents 


Ton, co =1 Yar Ad =47/6, (4/3) =4 


The result below is a key tool in proving Hélder’s inequality (7.9). 


7.8 Young’s inequality 


Suppose 1 < p < o. Then 


for alla >Oandb>0. 


Proof Fix b > 0 and define a function 


illiam H. % 1863-1942 
f: (0,00) + R by William Henry Young ( 942) 


published what is now called 
aP ope’ Young’s inequality in 1912. 


Thus f’(a) = a?~! — b. Hence f is decreasing on the interval (0, b!/(P-1)) and f is 
increasing on the interval (b!/(P—1), co). Thus f has a global minimum at b!/(P—1), 
A tiny bit of arithmetic [use p/(p — 1) = p’] shows that f (b!/(P-))) = 0. Thus 
f(a) > 0 for all a € (0,00), which implies the desired inequality. 


The important result below furnishes a key tool that is used in the proof of 
Minkowski’s inequality (7.14). 


7.9 Holder’s inequality 


Suppose (X, S,}) is a measure space, 1 < p < oo, and f,h: X — F are 


Falla <M Fllp All pr- 


S-measurable. Then 
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Proof Suppose 1 < p < ov, leaving the cases p = 1 and p = o as exercises for 
the reader. 

First consider the special case where || f ||» = ||h 
tells us that 


|| p) = 1. Young’s inequality (7.8) 


P |h(x)|P" 
Lr@)a(a)| < AUP 5 te 
P P 
for all x € X. Integrating both sides of the inequality above with respect to js shows 
that || f/||1 < 1 = ||f||p ||/|| 7, completing the proof in this special case. 


i Ifllp = 0 or |[h| pt iw ee men Holder’s inequality was proved in 
Ifill, = 0 and the desired inequal- | 7389 py Otto Holder (1859-1937). 
ity holds. Similarly, if || f||» = 0° or 


|| ||) = 9, then the desired inequality clearly holds. Thus we assume that 
0 < |Ifllp < co and 0 < |[h|| 7 < 00. 
Now define S-measurable functions f;,h,: X — F by 


Then ||f1||p = 1 and |||], = 1. By the result for our special case, we have 
fata lly <1, which implies that [Fhll1 < [lfllp lf 


The next result gives a key containment among Lebesgue spaces with respect to a 
finite measure. Note the crucial role that Hélder’s inequality plays in the proof. 


7.10 L4(u) C LP(y) if p < qand p(X) <0 


Suppose (X,S, j1) is a finite measure space and 0 < p < q < . Then 


Ifllp < wX)O PPO | fq 
for all f € £4(j). Furthermore, £4(w) C LP(p). 


Proof Fix f € £4(4). Letr = a Thus r > 1. A short calculation shows that 
— ray Now Hilder’s inequality (7.9) with p replaced by r and f replaced by | f|? 


and h replaced by the constant function 1 gives 


z 1/r ! 1/r’ 
fires (f sly’ an) (0 an) 
= p/q 
= u(x) na ffi? ay) 
Now raise both sides of the inequality above to the power a getting 


1/q 


(Jiri a)? < poxye-rv on ( fit aye) "7 


which is the desired inequality. 
The inequality above shows that f € £L?(u). Thus LI (uw) C LP(p). 
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7.11. Example £?(E) 


We adopt the common convention that if E is a Borel (or Lebesgue measurable) 
subset of R and 0 < p < oo, then £?(E) means L?(Ag), where Ag denotes 
Lebesgue measure A restricted to the Borel (or Lebesgue measurable) subsets of R 
that are contained in E. 

With this convention, 7.10 implies that 


if0 < p <q < ©, then £1((0,1]) C £?((0,1]) and || f\lp < IF llq 


for f € £1([0,1]). See Exercises 12 and 13 in this section for related results. 


Minkowski’s Inequality 


The next result is used as a tool to prove Minkowski’s inequality (7.14). Once again, 
note the crucial role that Hélder’s inequality plays in the proof. 


7.12 formula for || f\|p 


Suppose (X,S, j1) is a measure space, 1 < p < oo, and f € L?(pu). Then 


lflly = sup{| fh ay he L(y) and |Ihl|y <1}. 


Proof If ||f ||» = 0, then both sides of the equation in the conclusion of this result 
equal 0. Thus we assume that || f ||» A 0. 


Holder’s inequality (7.9) implies that if h € L?'(j) and |||] p< 1, then 


| f pean] < f pte dye < ULflly Melly < ILfllp 


Thus sup{| f flrdy| sh € L(y) and |||» <1}< [Ifllp: 
To prove the inequality in the other direction, define h: X — F by 


p-2 
n(x) = FOF a (set h(x) = 0 when f(x) = 0). 
IF llp 
Then f fh dy = ||f||p and ||/||,” = 1, as you should verify (use p — 2 = 1). Thus 
Ifllp < sup{|f fd]: © £P(p) and I[7\| 7 < 1}, as desired. 


7.13 Example = a point with infinite measure 


Suppose X is a set with exactly one element b and p is the measure such that 
u(D) = Oand p({b}) = co. Then £!(11) consists only of the 0 function. Thus if 
p = and f is the function whose value at b equals 1, then || f'||.o = 1 but the right 
side of the equation in 7.12 equals 0. Thus 7.12 can fail when p = ov. 


Example 7.13 shows that we cannot take p = oo in 7.12. However, if p is a 
g-finite measure, then 7.12 holds even when p = oo (see Exercise 9). 
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The next result, which is called Minkowski’s inequality, is an improvement for 
p = 1 of the inequality 7.5(a). 


7.14 Minkowski’s inequality 


Suppose (X,S, j) is a measure space, 1 < p < 00, and f,g € LP(m). Then 


If + sllp < Wf lly + Iislly- 


Proof Assume that 1 < p < oo (the case p = ovis left as an exercise for the reader). 
Inequality 7.5(a) implies that f + g € LP(p). 


Suppose h € £?'(y1) and Ally < 1. Then 


| [f+ ayhan| < fl phl dnt fIghl ay < (ifllp + isllp)IAlly 
< Iiflly + lial 


where the second inequality comes from Hélder’s inequality (7.9). Now take the 
supremum of the left side of the inequality above over the set of h € LP yt) such 
that ||/||,” < 1. By 7.12, we get ||f + gllp < Ilfllp + Ilgllp. as desired. 


EXERCISES 7A 


1 Suppose p/ is a measure. Prove that 


IF + Slloo SIF lloo + [Iglleo and [laf loo = [4] [If lleo 


for all f, g € L(p) and all w € F. Conclude that with the usual operations of 
addition and scalar multiplication of functions, £~(j) is a vector space. 


2 Suppose a > 0, b > 0, and 1 < p < ov. Prove that 
aP opp’ 
pp’ 


if and only if a? = bP’ [compare to Young’s inequality (7.8)]. 


ab = 


3 Suppose a1,...,a, are nonnegative numbers. Prove that 
(a, +-:- +4n)° < n’ (ay? a +n”). 
4 Prove Hélder’s inequality (7.9) in the cases p = 1 and p = ov. 


5 Suppose that (X,S,}1) is a measure space, 1 < p < oo, f € LP(p), and 
he LP’ py). Prove that Hélder’s inequality (7.9) is an equality if and only if 
there exist nonnegative numbers a and b, not both 0, such that 


al f(x)? = b|h(x)|P 


for almost every x € X. 
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Suppose (X,S, j1) is a measure space, f € L'(), andh € L®(p). Prove that 
[Filla = [IF lla IF lloo if and only if 


|A(x)] = [||] 00 
for almost every x € X such that f(x) 4 0. 
Suppose (X,S,,) is a measure space and f,h: X — F are S-measurable. 


Prove that 
\FAlle < WF lp Allg 


for all positive numbers p,q,r such that + 7 = 1 


Suppose (X,S, 11) is a measure space and n € Z*. Prove that 


IFufo- >: frlla S [falls Wfallp «++ W fall 


sae Ot Nel giao 
for all positive numbers pj,..., Pn such that mn tpt t 5, 


S-measurable functions f1, fo,..., fn: X > F. 


= landall 


Show that the formula in 7.12 holds for p = 00 if ju is a o-finite measure. 
Suppose 0 < p<q<ov. 


(a) Prove that 2? Cc @4, 


(b) Prove that ||(a1,4@2,..-)||p > ||(41,42,.--)||q for every sequence a1, 42,... 
of elements of F. 


Show that () ese gf. 
p> 


Show that ( £L?([0,1]) 4 £°((0,1)). 


p<oo 


Show that |.) £?([0,1]) A £'((0,1]). 
p> 


Suppose p,q € (0,00], with p 4 q. Prove that neither of the sets £?(R) and 
£1(R) is a subset of the other. 


Show that there exists f € £7(R) such that f ¢ L?(R) forall p € (0, 00] \ {2}. 


Suppose (X,S, j1) is a finite measure space. Prove that 
Him fll = I fllee 


for every S-measurable function f: X — F. 


Suppose y/ is a measure, 0 < p < oo, and f € LP(p). Prove that for every 
€ > 0, there exists a simple function g € £L?(j) such that || f — g|p < e. 
[This exercise extends 3.44.] 
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Suppose 0 < p < coand f € £?(R). Prove that for every € > 0, there exists a 
step function g € LP(R) such that || f — g||p <e. 
[This exercise extends 3.47.] 


Suppose 0 < p < co and f € L?P(R). Prove that for every « > 0, there 
exists a continuous function g: R — R such that ||f — g||p < € and the set 
{x © R: g(x) £0} is bounded. 

[This exercise extends 3.48. ] 


Suppose (X,S,) is a measure space, 1 < p < oo, and f,g € LP(p). Prove 
that Minkowski’s inequality (7.14) is an equality if and only if there exist 
nonnegative numbers a and J, not both 0, such that 


for almost every x € X. 


Suppose (X,S, j1) is a measure space and f,g € f) (j). Prove that 


If + lla = Iflla + Iiglh 


if and only if f(x) g(x) > 0 for almost every x € X. 


Suppose (X,S,}1) and (Y,7,v) are o-finite measure spaces and 0 < p < oo. 
Prove that if f € LP (pu x v), then 


[f]x € £?(v) for almost every x € X 


and 
[flY € £L?(p) for almost every y € Y, 


where | f], and [f]¥ are the cross sections of f as defined in 5.7. 
Suppose 1 < p < coand f € LP(R). 


(a) Fort € R, define f;: R — R by f;(x) = f(x —t). Prove that the function 
t+-+ || f — ftl|p is bounded and uniformly continuous on R. 


(b) Fort > 0, define f;: R — R by f(x) = f (tx). Prove that 


lim||f — fill p = 0. 
im||f — filly = 0 
Suppose 1 < p < ooand f € LP(R). Prove that 


1 ott byIP = 0 
hg ae = 


for almost every b € R. 
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Definition of L? (11) 


Suppose (X,S, j/) is a measure space and 1 < p < oo. If there exists a nonempty set 
E € S such that p(E) = 0, then ||x,||p = 0 even though x, 4 0; thus |]-||p is not a 
norm on £?(j1). The standard way to deal with this problem is to identify functions 
that differ only on a set of /-measure 0. To help make this process more rigorous, we 
introduce the following definitions. 


7.15 Definition Z(y); f 


Suppose (X,S, 11) is a measure space and 0 < p < ©. 


e Z() denotes the set of S-measurable functions from X to F that equal 0 
almost everywhere. 


e For f € LP(p), let f be the subset of L? (1) defined by 


fa{f+z:z€ Z(y)}. 


The set Z (1) is clearly closed under scalar multiplication. Also, Z(}) is closed 
under addition because the union of two sets with j/-measure 0 is a set with p- 
measure 0. Thus Z(}/) is a subspace of L(y), as we had noted in the third bullet 
point of Example 6.32. 

Note that if f,F € LP?(), then f = F if and only if f(x) = F(x) for almost 
every x € X. 


7.16 Definition L?(j) 


Suppose j/ is a measure and 0 < p < ov. 


e Let L?(j1) denote the collection of subsets of £?(j:) defined by 


DG) fee ray 


e For f,& € LP(u) anda € F, define f + ¢ and af by 


fe — Gee) and vay — ae, 


The last bullet point in the definition above requires a bit of care to verify that it 
makes sense. The potential problem is that if Z() A {O}, then f is not woe 
represented by f. Thus suppose f,F,¢,G € L?(w) and f = F and ¢ = G. For 
the definition of addition in L? (1) to make sense, we must verify that (f + - - 
(F + G)~. This verification is left to the reader, as is the similar verification that the 
scalar multiplication defined in the last bullet point above makes sense. 

You might want to think of elements of L?(j1) as equivalence classes of functions 
in LP (jt), where two functions are equivalent if they agree almost everywhere. 
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Mathematicians often pretend that ele- 
ments of L? (j1) are functions, where two 
functions are considered to be equal if 
they differ only on a set of /-measure 0. 
This fiction is harmless provided that the 
Operations you perform with such “func- 
tions” produce the same results if the func- 
tions are changed on a set of measure 0. 


Note the subtle typographic 
difference between LP (pw) and 
LP (y). An element of the 


calligraphic LP (1) is a function; an 
element of the italic L? (1) is a set 
of functions, any two of which agree 
almost everywhere. 


Suppose p/ is a measure and 0 < p < ov. Define ||-||) on L? (1) by 


lf llp = llfllp 
for f € LP(p). 


Note that if f, F € L(y) and f = F, then ||f||, = ||F||p. Thus the definition 
above makes sense. 

In the result below, the addition and scalar multiplication on L?(j1) come from 
7.16 and the norm comes from 7.17. 


7.18 L(y) isa normed vector space 


Suppose ji is a measure and 1 < p < ov. Then LP (1) is a vector space and ||-||p 
is anorm on LP (jz). 


The proof of the result above is left to the reader, who will surely use Minkowski’s 
inequality (7.14) to verify the triangle inequality. Note that the additive identity of 
LP (1) is 0, which equals 2 (1). 

For readers familiar with quotients of 
vector spaces: you may recognize that 


LP (2) is the quotient space LP (yz) = LP(p) = &P 


LP(u)/Z (hu). , 
because counting measure has no 
For readers who want to learn about quo- | sets of measure 0 other than the 


tients of vector spaces: see a textbook for | empty set. 

a second course in linear algebra. 
In the next definition, note that if E is a Borel set then 2.95 implies L?(E) using 

Borel measurable functions equals L?(E) using Lebesgue measurable functions. 


If is counting measure on Z*, then 


Definition 


If E is a Borel (or Lebesgue measurable) subset of R and 0 < p < oo, then 


LP(E) means L?(A¢), where Ag denotes Lebesgue measure A restricted to the 
Borel (or Lebesgue measurable) subsets of R that are contained in E. 
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LP? (4) Is a Banach Space 


The proof of the next result does all the hard work we need to prove that L?() is a 
Banach space. However, we state the next result in terms of £? (1) instead of LP () 
so that we can work with genuine functions. Moving to L? (jz) will then be easy (see 
7.24). 


7.20 Cauchy sequences in L? (11) converge 


Suppose (X,S, 4) is a measure space and 1 < p < ov. Suppose f1, fo,...isa 
sequence of functions in £?(j:) such that for every ¢ > 0, there exists n € ZT 
such that 


Ilfi — Sally < 


for all j > n and k > n. Then there exists f € £?(j) such that 


lim || fe — fllp = 0. 
dim I fx — fllp 


Proof The case p = ovis left as an exercise for the reader. Thus assume 1 < p < ov. 
It suffices to show that limyn—0o|| fk, — |p = 0 for some f € L? (pu) and some 
subsequence f;,, fk,,-- . (See Exercise 14 of Section 6A, whose proof does not require 
the positive definite property of a norm). 
Thus dropping to a subsequence (but not relabeling) and setting fp = 0, we can 
assume that 


CO 
Slt —fr-illp < 0. 
al 


Define functions 1, @2,... and g from X to [0, co] by 


Sm(x) = Yi felx) — fr-a(™)| and g(x) = Do fe(x) — fi-1(*)I- 
k=1 k=1 
Minkowski’s inequality (7.14) implies that 


m 
7.21 lem lp < Yi ilt = fe |p 
k=1 


Clearly limin— co &m(X) = g(x) for every x € X. Thus the Monotone Convergence 
Theorem (3.11) and 7.21 imply 


. o° P 
7.22 Js’ du = im, [ gn? du < (L ll fx = fe-allp) <0. 
kl 


Thus g(x) < oo for almost every x € X. 

Because every infinite series of real numbers that converges absolutely also con- 
verges, for almost every x € X we can define f(x) by 

(oe) m 
F(x) = Yi (fel) — fr-1(%)) = Jim, D1 (fe) — fe-1(*)) = lim, fm(2). 
k=1 k=1 

In particular, limyn— co f(x) exists for almost every x € X. Define f(x) to be 0 for 
those x € X for which the limit does not exist. 
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We now have a function f that is the pointwise limit (almost everywhere) of 
fi, fo,.... The definition of f shows that | f(x)| < g(x) for almost every x € X. 
Thus 7.22 shows that f € LP (p). 

To show that limy_;.0||f¢ — f || p = 0, suppose € > 0 and let n € Z* be such that 
fi —fellp < € for all j > n and k > n. Suppose k > n. Then 


Ife — fll = (| If ~ flay)" 
cfm sit)” 
= lim inf|| fi — fill 
<eé 


| 


where the second line above comes from Fatou’s Lemma (Exercise 17 in Section 3A). 
Thus limg_,.0||fx — f || p = 0, as desired. 


The proof that we have just completed contains within it the proof of a useful 
result that is worth stating separately. A sequence can converge in p-norm without 
converging pointwise anywhere (see, for example, Exercise 12). However, the next 
result guarantees that some subsequence converges pointwise almost everywhere. 


7.23 convergent sequences in CL? have pointwise convergent subsequences 


Suppose (X,S, 1) is a measure space and 1 < p < oo. Suppose f € LP(u 
and f;, f2,... is a sequence of functions in £?(j) such that jim lf —fllp = 0. 
—0o 


Then there exists a subsequence f,,, fk,,--- such that 
dim, fen (*) = F(x) 


for almost every x € X. 


Proof Suppose fx,, fk,,--- is a subsequence such that 


CO 
lhe helle=@: 
m=2 


An examination of the proof of 7.20 shows that lim Fen (X) = f(x) for almost 
m—oo 
every x € X. 


7.24 LP() is a Banach space 


Suppose p/ is a measure and 1 < p < oo. Then L?(j:) is a Banach space. 


Proof This result follows immediately from 7.20 and the appropriate definitions. 
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Duality 


Recall that the dual space of a normed vector space V is denoted by V’ and is defined 
to be the Banach space of bounded linear functionals on V (see 6.71). 

In the statement and proof of the next result, an element of an L? space is denoted 
by a symbol that makes it look like a function rather than like a collection of functions 
that agree except on a set of measure 0. However, because integrals and L?-norms 
are unchanged when functions change only on a set of measure 0, this notational 
convenience causes no problems. 


7.25 natural map of L?'(}1) into (L? (n))' preserves norms 


Suppose p/ is a measure and 1 < p < oo. Forh € L(y), define g,: LP(u) + F 
by 


gn(f) = J fhean. 


Then ht ++ ;, is a one-to-one linear map from L?’'(j1) to (LP (p))’ Furthermore, 
ll nl] = [l7llpr for all kh € LP(p). 


Proof Suppose h € L?’(j:) and f € L?(q). Then Hélder’s inequality (7.9) tells us 
that fh € L*(p) and that 
Falla < Wally lf llp- 


Thus @j, as defined above, is a bounded linear map from LP ( Lt) to F. Also, the map 


h ++ gy is clearly a linear map of L?'(j) into (LP ())’. Now 7.12 (with the roles of 
p and p’ reversed) shows that 


llenll = suptlen(f)] =f € LP(H) and |lfllp < 1b = Ih lly 
If hy, hy € LP(p) and Phy, = Phy» then 


Ita — hall pr = [nell = ll — Pigll = [101] = 0, 
which implies hy = hy. Thus h ++ gj isa one-to-one map from L?’(1) to (EP (u))’ 

The result in 7.25 fails for some measures pv if p = 1. However, if 1 is a o-finite 
measure, then 7.25 holds even if p = 1 (see Exercise 14). 

Is the range of the map 1 +> @y in 7.25 all of (LP ())'? The next result provides 
an affirmative answer to this question in the special case of @? for1 < p < ow. 
We will deal with this question for more general measures later (see 9.42; also see 
Exercise 25 in Section 8B). 

When thinking of £? as a normed vector space, as in the next result, unless stated 
otherwise you should always assume that the norm on £? is the usual norm ||-||, that 
is associated with L?(j), where ji is counting measure on Z*. In other words, if 
1< p<, then 


fo) 1/p 
Was-a2,--)lp = (Sola?) 
k=1 
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7.26 dual space of ¢? can be identified with 0?’ 


Suppose 1 < p < oo. For b = (by, b,...) € 0P", define Qp: LP — Fby 


Pp (a) = iS arb, 
c= 


where a = (a1,a2,...). Then b ++ gy is a one-to-one linear map from CP" onto 
(ay. Furthermore, ||Qj|| = ||D||,, for all b € oP’, 


Proof Fork € Z*, let e, € £? be the sequence in which each term is 0 except that 
the k'* term is 1; thus e, = (0,...,0,1,0,...). 
Suppose @ € (er). Define a sequence b = (by, b2,...) of numbers in F by 


be = (ex). 
Suppose a = (a1,d2,...) € &?. Then 
a= x aAnek, 
k=1 


where the infinite sum converges in the norm of €? (the proof would fail here if we 
allowed p to be oo). Because ¢ is a bounded linear functional on ¢?, applying ¢ to 
both sides of the equation above shows that 


p(a) = Y agbr. 
= 


We still need to prove that b € ¢?’. To do this, for n € Z* let Hn be counting 
measure on {1,2,...,n}. We can think of L? (ji) as a subspace of £? by identi- 
fying each (41,...,4n) € L? (fn) with (a1,...,4n,0,0,...) © €?. Restricting the 
linear functional gy to L? (1) gives the linear functional on L? (j1;,) that satisfies the 
following equation: 


n 
PIL? (14n) (a4,. ee /4n) — a arbr. 
k=1 
Now 7.25 (also see Exercise 14 for the case where p = 1) gives 


(or. Pa)l py = Nelee un) ll 
< llell. 


Because limy—+co||(b1,---,2n) ||" = ||b||p", the inequality above implies the in- 


equality |||, < ||g||. Thus b € éP", which implies that p = gp, completing the 
proof. 


The previous result does not hold when p = ov. In other words, the dual space 
of £° cannot be identified with £!. However, see Exercise 15, which shows that the 
dual space of a natural subspace of £° can be identified with él. 
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EXERCISES 7B 


10 
11 


Suppose n > 1 and0 < p < 1. Prove that if ||-|] is defined on F” by 
1/ 
Il (@1,---/4n) || = (Jarl? +++ + lanl?) ?, 
then ||-|| is not a norm on F”. 
(a) Suppose 1 < p < ov. Prove that there is a countable subset of £? whose 


closure equals £?. 

(b) Prove that there does not exist a countable subset of £°° whose closure 
equals @°. 

(a) Suppose 1 < p < ©. Prove that there is a countable subset of L?(R) 
whose closure equals L?(R). 

(b) Prove that there does not exist a countable subset of L°(R) whose closure 
equals L°(R). 


Suppose (xX, S, 7) is a o-finite measure space and 1 < p < oo. Prove that 
if f: X — Fis an S-measurable function such that fh € L(y) for every 


he LP(u), then f € LP(p). 


(a) Prove that if is a measure, 1 < p < ov, and f,g € LP() are such that 


+ 
Ifllp= Isle = [I 
P 


then f = g. 
(b) Give an example to show that the result in part (a) can fail if p = 1. 


(c) Give an example to show that the result in part (a) can fail if p = oo. 
Suppose (X,S, j/) is a measure space and 0 < p < 1. Show that 

If +sllp < IIfllb + llsllp 
for all S-measurable functions f, g: X > F. 


Prove that L? (j1), with addition and scalar multiplication as defined in 7.16 and 
norm defined as in 7.17, is anormed vector space. In other words, prove 7.18. 


Prove 7.20 for the case p = 00, 
Prove that 7.20 also holds for p € (0,1). 
Prove that 7.23 also holds for p € (0,1). 
Suppose 1 < p < o. Prove that 
{(ay,42,...) © €P ax #0 for every k € Zt} 


is not an open subset of ?. 


12 


13 


14 


15 


16 


17 
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Show that there exists a sequence f1, fo,... of functions in Li ( (0, 1}) such that 
lim || fell = 0 but 
k-0o 

sup{ f(x) :k € ZT} =00 


for every x € [0,1]. 
[This exercise shows that the conclusion of 7.23 cannot be improved to conclude 
that limp 00 f(x) = f (x) for almost every x € X.] 


Suppose (X,S, 4) is a measure space, 1 < p < oo, f € LP(p), and fy, fo,... 
is a sequence in L?(p) such that limg+oo|| fk — f lp = 0. Show that if 
g: X > F is a function such that limg_.. f,(x) = g(x) for almost every 
x € X, then f(x) = g(x) for almost every x € X. 


(a) Give an example of a measure p/ such that 7.25 fails for p = 1. 


(b) Show that if ji is a o-finite measure, then 7.25 holds for p = 1. 


Let 
co = { (41, 42, - “) € ee : lim an = O}. 
k-00 


Give co the norm that it inherits as a subspace of 0°. 

(a) Prove that co is a Banach space. 

(b) Prove that the dual space of cp can be identified with 1. 
Suppose 1 < p <2. 

(a) Prove that if w,z € C, then 


lao + z|P + |e — 2)? 
2 


jw +2? + fw ~ 2)? 


< fol? +z)? < SS 


(b) Prove that if j/ is a measure and f, g € LP(y.), then 


If +gllp + If —sllp 


lf +llp + Ilf — slp 
5 < If llp + Ilgllp < ; . 


2p-1 


Suppose 2 < p< o. 
(a) Prove that if w,z € C, then 


jw+z|P + |w—2z| 


jw +2? + fw ~ 2)? 
ot 


P 
< |u|? + |z|P < : 


(b) Prove that if p is a measure and f,¢ € £?(), then 


If +gllp +f —sllp 


If + oll + If - all 
Qp-1 : 


<IIfllb + Islip < ; 


[The inequalities in the two previous exercises are called Clarkson’s inequalities. 
They were discovered by James Clarkson in 1936. ] 
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18 Suppose (X,S,}1) is a measure space, 1 < p,q < oo, andh: X —> Fis an 
S-measurable function such that hf € L1(j) for every f € L?(j). Prove that 
f > hf isa continuous linear map from L? (1) to L4 (1). 


A Banach space is called reflexive if the canonical isometry of the Banach space 
into its double dual space is surjective (see Exercise 20 in Section 6D for the defi- 
nitions of the double dual space and the canonical isometry). 


19 Prove that if 1 < p < ©, then /P is reflexive. 
20 Prove that @! is not reflexive. 


21 Show that with the natural identifications, the canonical isometry of co into its 
double dual space is the inclusion map of co into £°° (see Exercise 15 for the 
definition of cg and an identification of its dual space). 


22 Suppose 1 < p < cand V,W are Banach spaces. Show that V x W isa 
Banach space if the norm on V x W is defined by 


Cf s)Il = (IFIP + Ilgll)'”? 
for f € Vandg € W. 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 
4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial 
use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give 
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license 
and indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter’s Creative 
Commons license, unless indicated otherwise in a credit line to the material. If material is not included 
in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation 
or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. 


® 
Chapter 8 ats 


Hilbert Spaces 


Normed vector spaces and Banach spaces, which were introduced in Chapter 6, 
capture the notion of distance. In this chapter we introduce inner product spaces, 
which capture the notion of angle. The concept of orthogonality, which corresponds 
to right angles in the familiar context of R* or R°, plays a particularly important role 
in inner product spaces. 

Just as a Banach space is defined to be a normed vector space in which every 
Cauchy sequence converges, a Hilbert space is defined to be an inner product space 
that is a Banach space. Hilbert spaces are named in honor of David Hilbert (1862- 
1943), who helped develop parts of the theory in the early twentieth century. 

In this chapter, we will see a clean description of the bounded linear functionals 
on a Hilbert space. We will also see that every Hilbert space has an orthonormal 
basis, which make Hilbert spaces look much like standard Euclidean spaces but with 
infinite sums replacing finite sums. 
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was opened in 1930, when Hilbert was near the end of his career there. Other 
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8A _ Inner Product Spaces 


Inner Products 


If p = 2, then the dual exponent p’ also equals 2. In this special case Hélder’s 
inequality (7.9) implies that if jz is a measure, then 


| fsay| < Wfllalglls 


for all f, ¢ € L7(). Thus we can associate with each pair of functions f,¢ € L?(p) 
a number f fg du. An inner product is almost a generalization of this pairing, with a 
slight twist to get a closer connection to the L?(j:)-norm. 

If g = f and F = R, then the left side of the inequality above is || f ||. However, 
if g = f and F = C, then the left side of the inequality above need not equal || f ||. 
Instead, we should take ¢ = f to get || f||5 above. 

The observations above suggest that we should consider the pairing that takes f, g 
to f f du. Then pairing f with itself gives || f||3. 

Now we are ready to define inner products, which abstract the key properties of 
the pairing f,g ++ f f ¥ dy on L?(p), where yp: is a measure. 


8.1 Definition inner product; inner product space 


An inner product on a vector space V is a function that takes each ordered pair 
f,g of elements of V to a number (f,@) € F and has the following properties: 


e positivity 
(f, f) € [0,00) for all f € V; 


e definiteness 
(f, f) = 0 if and only if f = 0; 


e linearity in first slot 


oe = (f,h) + (g,h) and (af, g) = a(f,g) for all f,g,h € V and 
alla € F; 


e conjugate symmetry 
(f,8) = (g,f) for all f,g € V. 


A vector space with an inner product on it is called an inner product space. The 
terminology real inner product space indicates that F = R; the terminology 
complex inner product space indicates that F = C. 


If F = R, then the complex conjugate above can be ignored and the conjugate 
symmetry property above can be rewritten more simply as (f,¢) = (g, f) for all 
fiZ EV. 

Although most mathematicians define an inner product as above, many physicists 
use a definition that requires linearity in the second slot instead of the first slot. 


8.2 
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Example inner product spaces 
For n € Zt, define an inner product on F” by 
((a1,. 7 .,4n), (b4,. * .,bn)) = a,b, + ied + anbn 


for (41,...,4n),(b1,...,b,) € F". When thinking of F” as an inner product 
space, we always mean this inner product unless the context indicates some other 
inner product. 


Define an inner product on (7 by 
((a1, 42, ete DP (bi, bo,. ‘ .)) => Ly aby 
k=1 


for (a1, 42,...),(b1,b2,...) € ¢?, Holder’s inequality (7.9), as applied to count- 
ing measure on Z* and taking p = 2, implies that the infinite sum above 
converges absolutely and hence converges to an element of F. When thinking 
of (? as an inner product space, we always mean this inner product unless the 
context indicates some other inner product. 


Define an inner product on C([0,1]), which is the vector space of continuous 
functions from [0,1] to F, by 


(fn) = [FR 


for f,g € C([0,1]). The definiteness requirement for an inner product is 


satisfied because if f: [0,1] — F is a continuous function such that i, FF=0, 
then the function f is identically 0. 


Suppose (X,S, j1) is a measure space. Define an inner product on L?(j1) by 


(f,8) = | Fean 


for f,¢ € L?(). Hélder’s inequality (7.9) with p = 2 implies that the integral 
above makes sense as an element of F. When thinking of L?(j) as an inner 
product space, we always mean this inner product unless the context indicates 
some other inner product. 


Here we use L?(y:) rather than £?(j) because the definiteness requirement fails 
on L(y) if there exist nonempty sets E € S such that p(E) = 0 (consider 
(Xp X,) to see the problem). 


The first two bullet points in this example are special cases of L*(y1), taking p/ to 
be counting measure on either {1,...,} or Z*. 
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As we will see, even though the main examples of inner product spaces are ry H) 
spaces, working with the inner product structure is often cleaner and simpler than 
working with measures and integrals. 


8.3 basic properties of an inner product 


Suppose V is an inner product space. Then 


(a) (0,g) = (g,0) =0 for every g € V; 
(Diet) — eC) oral fee Vv, 
(c) (f,ag) = «(f,g) forall a € F and f,g € V. 


Proof 


(a) For g € V, the function f +» (f,g) is a linear map from V to F. Because 
every linear map takes 0 to 0, we have (0,¢) = 0. Now the conjugate symmetry 
property of an inner product implies that 

(g,0) = (0,g) =0=0. 

(b) Suppose f,g,h € V. Then 
(fig th) = (oth f) = (sf) + Mf) = (Sf) + lf) = (f18) + fh). 

(c) Suppose a € F and f,¢ € V. Then 


(f, ag) = (ag, f) = a(g, f) = a(g, f) = K(f), 


as desired. 


If F = R, then parts (b) and (c) of 8.3 imply that for f € V, the function 
g + (f,g) is a linear map from V to R. However, if F = C and f # 0, then 
the function g + (f,g) is not a linear map from V to C because of the complex 
conjugate in part (c) of 8.3. 


Cauchy—Schwarz Inequality and Triangle Inequality 


Now we can define the norm associated with each inner product. We use the word 
norm (which will turn out to be correct) even though it is not yet clear that all the 
properties required of a norm are satisfied. 


Suppose V is an inner product space. For f € V, define the norm of f, denoted 


IIf ll. by 
If ll = / ff): 
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8.5 Example norms on inner product spaces 


In each of the following examples, the inner product is the standard inner product 
as defined in Example 8.2. 


e Ifn € Zt and (ay,...,an) € F", then 


War, yan) || = flan? += + lan 2. 


Thus the norm on F” associated with the standard inner product is the usual 
Euclidean norm. 


e If (a1,42,.. .) € 2, then 


i 2 1/2 
Iara, )I = (Sola?) 
k=1 


Thus the norm associated with the inner product on 07 is just the standard norm 
||-||2 on ¢? as defined in Example 7.2. 


e If pis a measure and f € L?(), then 


fl = (fife an) 


Thus the norm associated with the inner product on L?(j/) is just the standard 
norm ||-||2 on L?(2) as defined in 7.17. 


The definition of an inner product (8.1) implies that if V is an inner product space 
and f € V, then 


° |[fll 20; 
e || f|| = Oif and only if f = 0. 


The proof of the next result illustrates a frequently used property of the norm on 
an inner product space: working with the square of the norm is often easier than 
working directly with the norm. 


8.6 homogeneity of the norm 


Suppose V is an inner product space, f € V, and a € F. Then 


laf] = lal IFT. 


Proof Wehave 


llaf\l? = (af, of) = a(f of) = af, f) = lal’. 


Taking square roots now gives the desired equality. 
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The next definition plays a crucial role in the study of inner product spaces. 


In the definition above, the order of the two elements of the inner product space 
does not matter because (f,g) = 0 if and only if (g, f) = 0. Instead of saying that 
f and g are orthogonal, sometimes we say that f is orthogonal to g. 


8.8 Example orthogonal elements of an inner product space 
e nC’, (2, 3,51) and (6,1, —3i) are orthogonal because 
((2,3,5i), (6,1, -3i)) =2-64+3-1+45i- (31) =124+3-15=0. 


e The elements of L?((—7,, 7t]) represented by sin(3t) and cos(8¢) are orthogo- 
nal because 


> 


[ sin(3t) cos(8t) dt = Ee Cag 


—7 t=—7 


where dt denotes integration with respect to Lebesgue measure on (— 71, 71]. 


Exercise 8 asks you to prove that if a and b are nonzero elements in R?, then 
(a,b) = |la|| ||bl| cos 8, 


where @ is the angle between a and b (thinking of a as the vector whose initial point is 
the origin and whose end point is a, and similarly for b). Thus two elements of R? are 
orthogonal if and only if the cosine of the angle between them is 0, which happens if 
and only if the vectors are perpendicular in the usual sense of plane geometry. Thus 
you can think of the word orthogonal as a fancy word meaning perpendicular. 


Law professor Richard Friedman presenting a case before the U.S. 
Supreme Court in 2010: 


Mr. Friedman: I think that issue is entirely orthogonal to the issue here 
because the Commonwealth is acknowledging— 

Chief Justice Roberts: Ym sorry. Entirely what? 

Mr. Friedman: Orthogonal. Right angle. Unrelated. Irrelevant. 

Chief Justice Roberts: Oh. 

Justice Scalia: What was that adjective? I liked that. 

Mr. Friedman: Orthogonal. 

Chief Justice Roberts: Orthogonal. 

Mr. Friedman: Right, right. 

Justice Scalia: Orthogonal, ooh. (Laughter.) 

Justice Kennedy: 1 knew this case presented us a problem. (Laughter.) 
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The next theorem is over 2500 years old, although it was not originally stated in 
the context of inner product spaces. 


8.9 Pythagorean Theorem 


Suppose f and g are orthogonal elements of an inner product space. Then 


If +l? = IFIP + Iigll?. 


Proof We have 
If tell? = (f+s,f+8) 
= (if) + fi8) F468 -f) HAS 8) 
= |Ifll? + lig’. 


as desired. 


Exercise 3 shows that whether or not the converse of the Pythagorean Theorem 
holds depends upon whether F = R or F= C. 


Suppose f and g are elements of an inner product space V, 
with g A 0. Frequently it is useful to write f as some number c 
times g plus an element / of V that is orthogonal to g. The figure cs 
here suggests that such a decomposition should be possible. To 


find the appropriate choice for c, note that if f = cg +h for 8 
some c € F and some h € V with (h,¢) = 0, then we must 

have 

2 
(f,8) = (cg +g) = ells’. : 
which implies that c = f 8) , which then implies that Here 
lll f=cgth, 
(f,8) where h is 


h=f- . Hence we are led to the following result. 
f lIg\l . orthogonal to g. 


8.10 orthogonal decomposition 


Suppose f and g are elements of an inner product space, with g ~ 0. Then there 


exists h € V such that 


(h,g) =0 and fa eh 


Proof Seth = f — i: ts Then 


(hog) = (f— hes) = U8) — Bg.) <0 


giving the first equation in the conclusion. The second equation in the conclusion 
follows immediately from the definition of h. 
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The orthogonal decomposition 8.10 is the main ingredient in our proof of the next 
result, which is one of the most important inequalities in mathematics. 


8.11 Cauchy—Schwarz inequality 


Suppose f and g are elements of an inner product space. Then 


Fs) S IFIlIsi- 


with equality if and only if one of f, g is a scalar multiple of the other. 


Proof If g = 0, then both sides of the desired inequality equal 0. Thus we can 
assume g¢ # 0. Consider the orthogonal decomposition 


_ es) 
f= Tepes th 


given by 8.10, where /1 is orthogonal to g. The Pythagorean Theorem (8.9) implies 


iP =| ER) ue 


Fal 


2 
WF 81 + ||A\|? 


es 5 (ees)? 


Multiplying both sides of this inequality by ||¢||* and then taking square roots gives 
the desired inequality. 

The proof above shows that the Cauchy—Schwarz inequality is an equality if and 
only if 8.12 is an equality. This happens if and only if h = 0. But h = 0 if and only 
if f is a scalar multiple of g (see 8.10). Thus the Cauchy—Schwarz inequality is an 
equality if and only if f is a scalar multiple of g or g is a scalar multiple of f (or 
both; the phrasing has been chosen to cover cases in which either f or g equals 0). 


8.13 Example Cauchy—Schwarz inequality for F" 


Applying the Cauchy—Schwarz inequality with the standard inner product on F” 
to (|a,|,...,|an|) and (|by|,...,|bn|) gives the inequality 


Jarby| ++ + [anbnl < yal? +--+ + lanl or? + + [Bn 


for all (ay,...,4n),(b1,-..-,bn) € F". 


Thus we have a new and clean proof : ok ope 
Ce ae : The inequality in this example was 
of Holder’s inequality (7.9) for the spe- : 
: : : first proved by Cauchy in 1821. 
cial case where p/ is counting measure on 


Tact) ad pay = 2, 
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8.14 Example Cauchy-Schwarz inequality for L? (1) 


Suppose pi is a measure and f,g € L?(n). Applying the Cauchy—Schwarz 
inequality with the standard inner product on L?(:) to |f| and |g| gives the inequality 


[ifslows (fii? an) 


The inequality above is equivalent to 
Ho6lder’s inequality (7.9) for the special 
case where p = p’ = 2. However, 
the proof of the inequality above via the 
Cauchy—Schwarz inequality still depends 
upon Hdlder’s inequality to show that the 
definition of the standard inner product 
on L*(1) makes sense. See Exercise 18 
in this section for a derivation of the in- 


*(figPaw) 


In 1859 Viktor Bunyakovsky 
(1804-1889), who had been 
Cauchy’s student in Paris, first 
proved integral inequalities like the 


one above. Similar discoveries by 
Hermann Schwarz (1843-1921) in 
1885 attracted more attention and 
led to the name of this inequality. 


equality above that is truly independent of Hélder’s inequality. 


If we think of the norm determined by an 
inner product as a length, then the triangle in- L 


equality has the geometric interpretation that the 


length of each side of a triangle is less than the 


sum of the lengths of the other two sides. 


8.15 triangle inequality 


Suppose f and g are elements of an inner product space. Then 


with equality if and only if one of f, g is a nonnegative multiple of the other. 


Proof We have 
If +l? = (ft+sf +2) 
= (ff) + (@8) + 8) + &f) 
= (ff) + (gg) + f8) + 8) 
= |If II? + Ilgl? +2Re(f,g) 
8.16 < |LfII? + llgll? +218)! 
8.17 < |IF II? + llgl? + 211i Isl 


= (Ilfll + Ils)? 


where 8.17 follows from the Cauchy—Schwarz inequality (8.11). Taking square roots 
of both sides of the inequality above gives the desired inequality. 
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The proof above shows that the triangle inequality is an equality if and only if we 
have equality in 8.16 and 8.17. Thus we have equality in the triangle inequality if 
and only if 


8.18 (f,8) = IIFll ligll- 


If one of f, g is a nonnegative multiple of the other, then 8.18 holds, as you should 
verify. Conversely, suppose 8.18 holds. Then the condition for equality in the Cauchy— 
Schwarz inequality (8.11) implies that one of f, g is a scalar multiple of the other. 
Clearly 8.18 forces the scalar in question to be nonnegative, as desired. 


Applying the previous result to the inner product space L?( vt), where pt is a 
measure, gives a new proof of Minkowski’s inequality (7.14) for the case p = 2. 

Now we can prove that what we have been calling a norm on an inner product 
space is indeed a norm. 


8.19 ||-|| isa norm 


Suppose V is an inner product space and || f|| is defined as usual by 


iy vier) 


for f € V. Then ||-|| is a norm on V. 


Proof The definition of an inner product implies that ||-|| satisfies the positive defi- 
nite requirement for a norm. The homogeneity and triangle inequality requirements 
for a norm are satisfied because of 8.6 and 8.15. 


The next result has the geometric in- 
terpretation that the sum of the squares 
of the lengths of the diagonals of a 
parallelogram equals the sum of the 
squares of the lengths of the four sides. 


8.20 parallelogram equality 


Suppose f and g are elements of an inner product space. Then 


lf + ll? + If — gl? = 2ilf? + 2llsiP. 


Proof We have 
If tel? +f-sl? =F +e f+s)+(f-s,f-8) 
= |fIP + gl? + (8) + (ef) 

t IF? + Isl? — Cf 8) — (gf) 

= 2I|f\I? + lel, 


as desired. 


Section 8A _ Inner Product Spaces 221 


EXERCISES 8A 


1 Let V denote the vector space of bounded continuous functions from R to F. 
Let r1,12,... be a list of the rational numbers. For f, g € V, define 


(a= 3 Elder), 
k=1 


Show that (-,-) is an inner product on V. 


2 Prove that if f,g € L?(y), then 


1 
Ifl? isl? If.) = 5 ff LF @)sy) - s@)FW)P duly) aux) 
3 Suppose f and g are elements of an inner product space and 


lf +l? = IAP + Iigll?. 


(a) Prove that if F = R, then f and g are orthogonal. 


(b) Give an example to show that if F = C, then f and g can satisfy the 
equation above without being orthogonal. 


4 Find a,b € R° such that a is a scalar multiple of (1,6,3), b is orthogonal to 
(1,6,3), and (5,4, -—2) =a+b. 


5 Prove that 


for all positive numbers a,b, c,d, with equality if and only ifa =b=c=d. 


6 Prove that the square of the average of each finite list of real numbers containing 
at least two distinct real numbers is less than the average of the squares of the 
numbers in that list. 


7 Suppose f and g are elements of an inner product space and ||f|| < 1 and 
I|¢|| < 1. Prove that 


J1- fll? - lis? <1 -I¢F-g)1- 
8 Suppose a and b are nonzero elements of R*. Prove that 
(a,b) = |lal| |[B|| cos 8, 


where 0 is the angle between a and b (thinking of a as the vector whose initial 
point is the origin and whose end point is a, and similarly for b). 


Hint: Draw the triangle formed by a, b, and a — b; then use the law of cosines. 
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The angle between two vectors (thought of as arrows with initial point at the 
origin) in R? or R° can be defined geometrically. However, geometry is not as 
clear in R" for 1 > 3. Thus the angle between two nonzero vectors a,b € R” 
is defined to be 
a,b) 

arccos ;>———, 
lla! Well 
where the motivation for this definition comes from the previous exercise. Ex- 
plain why the Cauchy—Schwarz inequality is needed to show that this definition 
makes sense. 


(a) Suppose f and g are elements of a real inner product space. Prove that f 
and g have the same norm if and only if f + g is orthogonal to f — g. 

(b) Use part (a) to show that the diagonals of a parallelogram are perpendicular 
to each other if and only if the parallelogram is a rhombus. 


Suppose f and g are elements of an inner product space. Prove that || f || = ||¢|| 
if and only if ||sf + tg|| = ||£f +sg|| for all s,t eR. 


Suppose f and g are elements of an inner product space and || f|| = |||] = 1 
and (f,¢) = 1. Prove that f = g. 


Suppose f and g are elements of a real inner product space. Prove that 


tg [srsi— sll 


Suppose f and g are elements of a complex inner product space. Prove that 


ya lA sll? = If = gll? + lf + igl?i = If — igi 
1 ; 


(f,8 


Suppose f, ¢,/ are elements of an inner product space. Prove that 


— FI\2 2 Bl 2 es 
hye +ee = McA sl _ Ital? 


Prove that a norm satisfying the parallelogram equality comes from an inner 
product. In other words, show that if V is a normed vector space whose norm 
||-|| satisfies the parallelogram equality, then there is an inner product (-,-) on 
V such that || f|| = (f, f)!/? for all f € V. 


Let A denote Lebesgue measure on [1, 00). 


(a) Prove that if f: [1,00) — [0,co) is Borel measurable, then 


([ roam)’ < [P26 @y awn. 


(b) Describe the set of Borel measurable functions f: [1,00) — [0,00) such 
that the inequality in part (a) is an equality. 
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Suppose ji is a measure. For f, g € L?(1), define (f, ¢) by 


(f, 8) = | Fay. 
(a) Using the inequality 
LF(x)a) < 3 (IF(®)P + Ig(x)/), 


verify that the integral above makes sense and the map sending f, ¢ to (f, 2) 
defines an inner product on L? (74) (without using Hélder’s inequality). 


(b) Show that the Cauchy—Schwarz inequality implies that 


lIfsila < Ifllallslle 


for all f,¢ € L?(n) (again, without using Hélder’s inequality). 


Suppose Vj,..., Vin are inner product spaces. Show that the equation 


((fir-- ++ fm), (S1r-+ +1 8m)) = (ft, 1) + +++ + Cf Sm) 


defines an inner product on V; x -- + X Vin. 
[Each of the inner product spaces V1,..., Vim may have a different inner product, 
even though the same inner product notation is used on all these spaces. | 


Suppose V is an inner product space. Make V x V an inner product space 
as in the exercise above. Prove that the function that takes an ordered pair 
(f,g) € V x V to the inner product (f, ¢) € F is a continuous function from 
V x VtoF. 


Suppose 1 < p<o. 


(a) Show the norm on £? comes from an inner product if and only if p = 2. 


(b) Show the norm on L?(R) comes from an inner product if and only if p = 2. 


Use inner products to prove Apollonius’s identity: 
In a triangle with sides of length a, b, and c, let d 
be the length of the line segment from the midpoint 
of the side of length c to the opposite vertex. Then 


ae+y= 4c? on, 
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8B Orthogonality 


Orthogonal Projections 


The previous section developed inner product spaces following a standard linear 
algebra approach. Linear algebra focuses mainly on finite-dimensional vector spaces. 
Many interesting results about infinite-dimensional inner product spaces require an 
additional hypothesis, which we now introduce. 


Definition 


A Hilbert space is an inner product space that is a Banach space with the norm 
determined by the inner product. 


8.22 Example Hilbert spaces 


e Suppose p is a measure. Then L(y) with its usual inner product is a Hilbert 
space (by 7.24). 


e As a special case of the first bullet point, if € Z* then taking p/ to be counting 
measure on {1,...,} shows that F” with its usual inner product is a Hilbert 
space. 


e As another special case of the first bullet point, taking jv to be counting measure 
on Z+ shows that ¢? with its usual inner product is a Hilbert space. 


Every closed subspace of a Hilbert space is a Hilbert space [by 6.16(b)]. 


8.23 Example not Hilbert spaces 


e The inner product space ¢1, where ((a1,a2,...), (b1,b2,..-)) = Dp1 ands, is 
not a Hilbert space because the associated norm is not complete on @!. 


e The inner product space C([0, 1]) of continuous F-valued functions on the inter- 


val [0,1], where (f,g) = f : f%, is not a Hilbert space because the associated 
norm is not complete on C (\o, cape 


The next definition makes sense in the context of normed vector spaces. 
Definition 


Suppose U is a nonempty subset of a normed vector space Vand f € V. The 


distance from f to U, denoted distance(f, U), is defined by 


distance(f, U) = inf{||f — g|| : g € U}. 


Notice that distance(f, U) = 0 if and only if f € U. 
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8.25 Definition convex set 


e A subset of a vector space is called convex if the subset contains the line 
segment connecting each pair of points in it. 


e More precisely, suppose V is a vector space and U C V. Then U is called 
convex if 


(1—t)f + tg € U for all t € [0,1] and all f,g € U. 


Convex subset of R?. Nonconvex subset of R?. 


8.26 Example convex sets 


e Every subspace of a vector space is convex, as you should verify. 


e If V is anormed vector space, f € V, andr > 0, then the open ball centered at 
f with radius r is convex, as you should verify. 


The next example shows that the distance from an element of a Banach space to a 
closed subspace is not necessarily attained by some element of the closed subspace. 
After this example, we will prove that this behavior cannot happen in a Hilbert space. 


8.27. Example no closest element to a closed subspace of a Banach space 


In the Banach space C({0,1]) with norm ||¢|| = sup|g], let 
[0,1] 


y 


u= fgec((o.1)): [°¢=0andg(t) =o}. 


Then U is a closed subspace of C([0,1]). 
Let f € C((0,1]) be defined by f(x) = 1— x. Fork € Z*, let 


ae et ; x-1 
8k a ae AG 


Then gy € U and limg_,.o||f — gx|| = 3, which implies that distance(f,U) < 5. 
If g € U, then iG (f —g) = 4 and (f — g)(1) = 0. These conditions imply that 


lf -gll > 3 
Thus distance(f,U) = 4 but there does not exist g € U such that || f — g|| = 3. 
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In the next result, we use for the first time the hypothesis that V is a Hilbert space. 


8.28 distance to a closed convex set is attained in a Hilbert space 


e The distance from an element of a Hilbert space to a nonempty closed convex 
set is attained by a unique element of the nonempty closed convex set. 


e More specifically, suppose V is a Hilbert space, f € V, and U is a nonempty 
closed convex subset of V. Then there exists a unique g € U such that 


lf — g|| = distance(f, U). 


Proof First we prove the existence of an element of U that attains the distance to f. 
To do this, suppose 91, 92,... iS a sequence of elements of U such that 


8.29 jim If — g|| = distance(f, U). 
Then for j,k € Z* we have 
llgi— sell? = IO — gx) - F -—3,)I 
= 2ILf — sell? + 2I,f — gill? - 2 — (e+ 3) 


= 2llf — gel? +2I1f — gy? 4] f — 84) 


A 2 
8.30 < 2Ilf — gull +2ILf — gill? — 4 (distance(f, U))”, 


where the second equality comes from the parallelogram equality (8.20) and the 
last line holds because the convexity of U implies that (g, + 8) /2 € U. Now the 
inequality above and 8.29 imply that 91, ¢2,... is a Cauchy sequence. Thus there 
exists g € V such that 


8.31 lim ||g;. — g|| = 0. 
k- 00 


Because U is a closed subset of V and each g; € U, we know that g € U. Now 8.29 
and 8.31 imply that 
If — g|| = distance(f, U), 
which completes the existence proof of the existence part of this result. 
To prove the uniqueness part of this result, suppose g and ¢ are elements of U 
such that 


8.32 If — sll = If — $ll = distance(f, U). 
Then 


x||2 ve w= ||2 ‘ 2 
lg — all < 2Ilf — gil? + 2If — lf — 4(distance(f, U)) 
8.33 = 0, 
where the first line above follows from 8.30 (with gj replaced by g and gx replaced 


by &) and the last line above follows from 8.32. Now 8.33 implies that g = &, 
completing the proof of uniqueness. 
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Example 8.27 showed that the existence part of the previous result can fail in a 
Banach space. Exercise 13 shows that the uniqueness part can also fail in a Banach 
space. These observations highlight the advantages of working in a Hilbert space. 


8.34 Definition orthogonal projection; Py 


Suppose U is a nonempty closed convex subset of a Hilbert space V. The 
orthogonal projection of V onto U is the function Pj: V — V defined by setting 
Pu(f) equal to the unique element of U that is closest to f. 


The definition above makes sense because of 8.28. We will often use the notation 
Puf instead of Py(f). To test your understanding of the definition above, make sure 
that you can show that if U is a nonempty closed convex subset of a Hilbert space V, 
then 


e Puf =f if and only if f € U; 


e Pu o Py = Pu. 


8.35 Example orthogonal projection onto closed unit ball 
Suppose U is the closed unit ball {g € V: ||g|| < 1} ina Hilbert space V. Then 


} if ||fl| <1, 
Puf=% ¢ 


i fll > 1, 


as you should verify. 


8.36 Example orthogonal projection onto a closed subspace 


Suppose U is the closed subspace of ¢* consisting of the elements of £2 whose 
even coordinates are all 0: 


U = {(a1,0,a3,0,a5,0,...) : each ag € F and Y- |aon—1l" < co}. 
k=1 


Then for b = (b1, bz, b3, bg, bs, be, - . :) € 2, we have 
Pyb = (b1,0, b3,0, b5,0,..-), 


as you should verify. 

Note that in this example the function Py is a linear map from (7 to (unlike the 
behavior in Example 8.35). 

Also, notice that b — Pizb = (0, b2,0, b4,0, bg,.. .) and thus b — Pizb is orthogonal 
to every element of U. 


The next result shows that the properties stated in the last two paragraphs of the 
example above hold whenever U is a closed subspace of a Hilbert space. 
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8.37 orthogonal projection onto closed subspace 

Suppose U is a closed subspace of a Hilbert space V and f € V. Then 

(a) f — Pyf is orthogonal to g for every g € U; 

(b) if h € Uand f — his orthogonal to g for every g € U, then h = Py f; 


(c) Py: V — Visa linear map; 


(d) ||Puf|| < |[f ||, with equality if and only if f € U. 


Proof The figure below illustrates (a). To prove (a), suppose g € U. Then for all 
a € F we have 


ILf — Pufll? < \f — Puf + 2g|l? 
= (f — Puf +ag,f — Puf +a) 
= |f — Pufll* + le? ligil? +2Rea(f — Puf,g). 


Let a = —t(f — Pyf,g) fort > 0. A tiny bit of algebra applied to the inequality 
above implies 
2\(f — Puf, gl < #l(f — Puf-g)P lai? 
for all t > 0. Thus (f — Puf,g) = 0, completing the proof of (a). 
To prove (b), suppose h € U and f — h is orthogonal to g for every g € U. If 
g €U, then h — g € U and hence f — h is orthogonal to h — g. Thus 


lf — All? < If — hI? + [lh gl? f-Puf 
= If —h) + (hs)? u 
—e P; f 
=|If -gll’, : " 


f — Puf is orthogonal to each element of U. 
where the first equality above follows from the Pythagorean Theorem (8.9). Thus 


If -All <Ilf—all 


for all g € U. Hence h is the element of U that minimizes the distance to f, which 
implies that h = Pi, f, completing the proof of (b). 

To prove (c), suppose f1, fo € V. If g € U, then (a) implies that (f; — Pufi,g) = 
(fo — Puf2z,g) = 0, and thus 


(fi + f2) — (Pufi + Pufe),g) = 0. 
The equation above and (b) now imply that 
Pu(fi + fo) = Pufi + Pufe. 


The equation above and the equation Py(af) = «Pif for « € F (whose proof is left 
to the reader) show that Py; is a linear map, proving (c). 
The proof of (d) is left as an exercise for the reader. 
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Orthogonal Complements 


8.38 Definition orthogonal complement; U+ 


Suppose U is a subset of an inner product space V. The orthogonal complement 
of U is denoted by U+ and is defined by 


U+={heV: (g,h) =0 forall g € U}. 


In other words, the orthogonal complement of a subset U of an inner product 
space V is the set of elements of V that are orthogonal to every element of U. 


8.39 Example orthogonal complement 


Suppose U is the set of elements of (? whose even coordinates are all 0: 
U = {(a1,0,a3,0,a5,0,...) : each ag € F and Yara? < co}. 
k=1 


Then U- is the set of elements of £2 whose odd coordinates are all 0: 


ut = {0,a2,0,a4,0,a6,...) : each a, € F and Y- |aox|? < oo}, 
k=1 
as you should verify. 


8.40 properties of orthogonal complement 


Suppose U is a subset of an inner product space V. Then 
(a) U- is aclosed subspace of V; 

(Ue tor, 

(c) ifW CU, then Ut c WH; 


qd) U =u; 
Cet. 


Proof To prove (a), suppose hj, /,... is a sequence in U+ that converges to some 
he V.Ifg € U, then 


|(g/)| = (gh — he) < lIgll ]k — fll for each k € Z*; 


hence (g,/1) = 0, which implies that h € U+. Thus U+ is closed. The proof of (a) 
is completed by showing that U-- is a subspace of V, which is left to the reader. 

To prove (b), suppose g € UM U". Then (g,¢) = 0, which implies that ¢ = 0, 
proving (b). 

To prove (e), suppose g € U. Thus (¢,h) = 0 forall h € U+, which implies that 
g € (U+)+. Hence U c (U+)+, proving (e). 

The proofs of (c) and (d) are left to the reader. 


a 
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The results in the rest of this subsection have as a hypothesis that V is a Hilbert 
space. These results do not hold when V is only an inner product space. 


8.41 orthogonal complement of the orthogonal complement 


Suppose UI is a subspace of a Hilbert space V. Then 


(aa: 


Proof Applying 8.40(a) to U+, we see that (U+)+ is a closed subspace of V. Now 
taking closures of both sides of the inclusion U C (i [8.40(e)] shows that 
uc (u+)-+. 

To prove the inclusion in the other direction, suppose f € (U+)+. Because 
f € (U+)+ and Pyf € U c (U+)+ (by the previous paragraph), we see that 


f=Pof (a). 
Also, 
f—Pyof €ut 
by 8.37(a) and 8.40(d). Hence 
f —Pygf €U>N(U-)-. 
Now 8.40(b) (applied to U+ in place of U) implies that f — Paf = 0, which implies 
that f € U. Thus (i) c U, completing the proof. 


As a special case, the result above implies that if U is a closed subspace of a 
Hilbert space V, then U = (U+)+. 

Another special case of the result above is sufficiently useful to deserve stating 
separately, as we do in the next result. 


8.42 necessary and sufficient condition for a subspace to be dense 


Suppose U is a subspace of a Hilbert space V. Then 


U—V ifandonlyit — — 10) 


Proof First suppose U = V. Then using 8.40(d), we have 


ut =U =v = {0}. 


To prove the other direction, now suppose Ut = {0}. Then 8.41 implies that 


U = (ut) = {0}! =v, 


completing the proof. 
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The next result states that if U is a 
closed subspace of a Hilbert space V, 
then V is the direct sum of U and U+, 
often written V = U@ U", although 
we do not need to use this terminology 
or notation further. 

The key point to keep in mind is u 
that the next result shows that the pic- 0 
ture here represents what happens in 
general for a closed subspace U of a 
Hilbert space V: every element of V 
can be uniquely written as an element 
of U plus an element of U+. 


ut 


8.43 orthogonal decomposition 


Suppose U is a closed subspace of a Hilbert space V. Then every element f € V 
can be uniquely written in the form 


Je eae 


where g € Uandh € U“. Furthermore, g = Py f andh = f — Pyuf. 


Proof Suppose f € V. Then 


P= Lapa Pas), 


where Pi; f € U [by definition of Piyf as the element of U that is closest to f] and 
f-Puf € ut [by 8.37(a)]. Thus we have the desired decomposition of f as the 
sum of an element of U and an element of U+. 

To prove the uniqueness of this decomposition, suppose 


f=ath =82+h, 


where 91,22 € U and My,hy € Ut. Then gy — go = hy — hy € UN U", which 
implies that g; = go and hy = hg, as desired. 


In the next definition, the function I depends upon the vector space V. Thus a 
notation such as Iy might be more precise. However, the domain of I should always 
be clear from the context. 


Suppose V is a vector space. The identity map I is the linear map from V to V 
defined by If = f for f € V. 


The next result highlights the close relationship between orthogonal projections 
and orthogonal complements. 
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8.45 range and null space of orthogonal projections 


Suppose U is a closed subspace of a Hilbert space V. Then 


(a) range Py = U and null Py = Ut; 


(b) range Pi = Ut and null Pj = U; 


(c) Py =I- Py. 


Proof The definition of Py f as the closest point in U to f implies range Py C U. 
Because Piyg = g forall g € U, we also have U C range Py. Thus range Py = U. 

If f € null Py, then f € U+ [by 8.37(a)]. Thus null Py C U+. Conversely, if 
f € U", then 8.37(b) (with h = 0) implies that Py.f = 0; hence U+ C null Py. 
Thus null Pj; = U--, completing the proof of (a). 

Replace U by U* in (a), getting range Pj. = U+ andnull Pj. = (U+)+ =U 
(where the last equality comes from 8.41), completing the proof of (b). 

Finally, if f € U, then 


Pyif =0=f—Puf =(1-Pu)f, 


where the first equality above holds because null Py = U [by ()]. 
If f € U4, then 


Puf=f=f—Puf =(-Pu)f. 


where the second equality above holds because null Pj; = U+ [by (a)]. 

The last two displayed equations show that P,;, and I — Py agree on U and agree 
on U+. Because Pi; and I — Py are both linear maps and because each element of 
V equals some element of U plus some element of U+ (by 8.43), this implies that 
Pit = I — Pu, completing the proof of (c). 


8.46 Example Py. =I— Pu 
Suppose U is the closed subspace of L?(R) defined by 
U = {f € L?(R) : f(x) = 0 for almost every x < O}. 
Then, as you should verify, 
Ut = {f € L?(R) : f(x) =0 for almost every x > 0}. 
Furthermore, you should also verify that if f € L?(R), then 
Puf = FX.) and Pysf = X00, 0): 


Thus Pi f = f(1— Xo oo) = (I — Py) f and hence P,,, = I — Py, as asserted in 
8.45(c). 
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Riesz Representation Theorem 


Suppose h is an element of a Hilbert space V. Define gy: V > F by 9(f) = (f,h) 
for f € V. The properties of an inner product imply that @ is a linear functional. 
The Cauchy—Schwarz inequality (8.11) implies that |p(f)| < |[f|| ||| for all f € V, 
which implies that @ is a bounded linear functional on V. The next result states that 
every bounded linear functional on V arises in this fashion. 

To motivate the proof of the next result, note that if @ is as in the paragraph above, 
then null g = {h}+. Thus / € (null g)+ [by 8.40(e)]. Hence in the proof of the 
next result, to find h we start with an element of (null g)+ and then multiply it by a 
scalar to make everything come out right. 


8.47 Riesz Representation Theorem 


Suppose ¢ is a bounded linear functional on a Hilbert space V. Then there exists 


a unique h € V such that 
o(f) = (fh) 
| = [lal 


Proof If pg =0, take h = 0. Thus we can assume ¢ # 0. Hence null ¢ is a closed 
subspace of V not equal to V (see 6.52). The subspace (null g)+ is not {0} (by 
8.42). Thus there exists ¢ € (null g)+ with ||¢|] = 1. Let 


h= 9(g)g. 
Taking the norm of both sides of the equation above, we get |||| = |g(¢)|. Thus 
8.48 p(h) = |9(g)I* = |lAll?. 


Now suppose f € V. Then 


(i) = (f— Tank) + (Ta) 
=( h 
=9 


olf), 
tale ) 


(f), 


where 8.49 holds because f — i h € null 9 (by 8.48) and h is orthogonal to all 


elements of null g. 
We have now proved the existence of h € V such that g(f) = (f,/) for all 
f € V. To prove uniqueness, suppose i € V has the same property. Then 


(n—h,h—h) = (h—h,h) — (h-hh) = g(h—h) — (hh) =0, 


which implies that h = h, which proves uniqueness. 

The Cauchy—Schwarz inequality implies that |p(f)| = |(f,4)| < |||] [|| for 
all f € V, which implies that ||@|| < ||h||. Because p(h) = (h,h) = -|\h\l2, we also 
have ||g|| > ||/||. Thus ||g|| = ||/||, completing the proof. 
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Suppose that p is a measure and : ; a 
1 < p < ©. In 7.25 we considered the ee i a 
; f 8.47 in 1907. 
natural map of L? (1) into (L?(p)) , and 
we showed that this maps preserves norms. In the special case where p = p’ = 2, 
the Riesz Representation Theorem (8.47) shows that this map is surjective. In other 


words, if ¢ is a bounded linear functional on L* (jz), then there exists h € L*(j) such 
that 


(f) = | thay 


for all f € eal H) (take h to be the complex conjugate of the function given by 8.47). 
Hence we can identify the dual of L?(j1) with L?(:). In 9.42 we will deal with other 
values of p. Also see Exercise 25 in this section. 


EXERCISES 8B 


1 Show that each of the inner product spaces in Example 8.23 is not a Hilbert 
space. 


2 Prove or disprove: The inner product space in Exercise | in Section 8A is a 
Hilbert space. 


3 Suppose V1, V2,... are Hilbert spaces. Let 


V = { (fifo) € Vy x Vo x +1 Fllfel? <0}. 
k=1 


Show that the equation 


Gee Lai 


defines an inner product on V that makes V a Hilbert space. 

[Each of the Hilbert spaces V1, V2,... may have a different inner product, even 
though the same notation is used for the norm and inner product on all these 
Hilbert spaces. | 


4 Suppose V is a real Hilbert space. The complexification of V is the complex 
vector space Vc defined by Vc = V x V, but we write a typical element of Vc 
as f + ig instead of (f, ¢). Addition and scalar multiplication are defined on 
Vc by 

sr igi) se iar tga) = ace fa) tae a) 
and 
(a + Bi)(f + ig) = (af — Bg) + (ag + Bf)i 
for fi, fo, f, 21, 82,8 € V anda, B € R. Show that 


(fi + igi, fo + ga) = (fir fo) + (e182) + (gir f2) — (fi 82))t 


defines an inner product on Vc that makes Vc into a complex Hilbert space. 


10 


11 


12 


13 


14 


15 
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Prove that if V is a normed vector space, f € V, and r > 0, then the open ball 
B(f,r) centered at f with radius r is convex. 


(a) Suppose V is an inner product space and B is the open unit ball in V (thus 
B={f €V: |fl] < 1}). Prove that if U is a subset of V such that 
BCUCB, then U is convex. 


(b) Give an example to show that the result in part (a) can fail if the phrase 


inner product space is replaced by Banach space. 


Suppose V is a normed vector space and U is a closed subset of V. Prove that 
U is convex if and only if 


es € U forall f,g € U. 


Prove that if U is a convex subset of a normed vector space, then U is also 
convex. 


Prove that if U is a convex subset of a normed vector space, then the interior of 
U is also convex. 
[The interior of U is the set {f € U: B(f,r) C U for some r > 0}.] 


Suppose V is a Hilbert space, U is a nonempty closed convex subset of V, and 
g € Uis the unique element of U with smallest norm (obtained by taking f = 0 
in 8.28). Prove that 


Re(g,h) > |\g|l? 
for all h € U. 


Suppose V is a Hilbert space. A closed half-space of V is a set of the form 
{g © V: Re(g,h) > c} 


for some h € V and some c € R. Prove that every closed convex subset of V is 
the intersection of all the closed half-spaces that contain it. 


Give an example of a nonempty closed subset U of the Hilbert space (7 and 
a € € such that there does not exist b € U with ||a — b|| = distance(a, U). 
[By 8.28, U cannot be a convex subset of ¢7.] 


In the real Banach space R* with norm defined by ||(x,) oo = max{|x|, |y|}, 
give an example of a closed convex set U C R* and z € R? such that there 
exist infinitely many choices of w € U with ||z — w|| = distance(z, U). 


Suppose f and g are elements of an inner product space. Prove that (f, 7) =0 


if and only if 
IIFll < If + gl 
for alla € F. 


Suppose U is a closed subspace of a Hilbert space V and f € V. Prove that 
|Puf || < || f ||, with equality if and only if f € U. 
[This exercise asks you to prove 8.37(d).] 
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Suppose V is a Hilbert space and P: V - V is a linear map such that P* = P 
and ||Pf|| < ||f|| for every f € V. Prove that there exists a closed subspace LU 
of V such that P = Py. 


Suppose U is a subspace of a Hilbert space V. Suppose also that W is a Banach 
space and S: U —+ W isa bounded linear map. Prove that there exists a bounded 
linear map T: V — W such that T|yy = S and ||T|| = ||S||. 

LUfW =F, then this result is just the Hahn—Banach Theorem (6.69) for Hilbert 
spaces. The result here is stronger because it allows W to be an arbitrary 
Banach space instead of requiring W to be F. Also, the proof in this Hilbert 
space context does not require use of Zorn’s Lemma or the Axiom of Choice. ] 


Suppose U and W are subspaces of a Hilbert space V. Prove that U = W if and 
only if Ut = W+. 


Suppose LI and W are closed subspaces of a Hilbert space. Prove that PiyPw = 0 
if and only if (f,g) = 0 forall f € U andall g € W. 

Verify the assertions in Example 8.46. 

Show that every inner product space is a subspace of some Hilbert space. 


Hint: See Exercise 13 in Section 6C. 


Prove that if T is a bounded linear operator on a Hilbert space V and the 
dimension of range T is 1, then there exist g, € V such that 


Tf = (f,g)h 
forall f € V. 


(a) Give an example of a Banach space V and a bounded linear functional @ 
on V such that |e(f)| < ||¢|| ||f|| for all f € V \ {0}. 


(b) Show there does not exist an example in part (a) where V is a Hilbert space. 


(a) Suppose @ and w are bounded linear functionals on a Hilbert space V such 
that ||~ + || = ||@|| + |||]. Prove that one of g, w is a scalar multiple of 
the other. 

(b) Give an example to show that part (a) can fail if the hypothesis that V is a 
Hilbert space is replaced by the hypothesis that V is a Banach space. 


(a) Suppose that p is a finite measure, 1 < p < 2, and @g is a bounded 
linear functional on L?(j). Prove that there exists h € L?’'(w) such that 
p(f) = f fhdp for every f € LP (pu). 

(b) Same as part (a), but with the hypothesis that j/ is a finite measure replaced 
by the hypothesis that jz is a measure. 


[See 7.25, which along with this exercise shows that we can identify the dual of 
LP (u) with LP() for 1 < p <2. See 9.42 for an extension to all p € (1,0°).] 


Prove that if V is a infinite-dimensional Hilbert space, then the Banach space 
B(V) is nonseparable. 
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8C Orthonormal Bases 


Bessel’s Inequality 


Recall that a family {e,},cr in a set V is a function e from a set I to V, with the 
value of the function e at k € T denoted by e, (see 6.53). 


8.50 Definition orthonormal family 


A family {e,},er in an inner product space is called an orthonormal family if 


_ fo if Ak 
tan) = {1 ifj=k 


forall j,k ET. 


In other words, a family {e,},er is an orthonormal family if e; and e; are orthog- 
onal for all distinct j,k € T and |le,|| = 1 for all k € T. 


8.51 Example orthonormal families 


e Fork € Z*, let e; be the element of (? all of whose coordinates are 0 except for 
the k"* coordinate, which is 1: 


e. = (0,...,0,1,0,...). 


Then {e,};¢z+ is an orthonormal family in ¢7. In this case, our family is a 
sequence; thus we can call {e,},¢7+ an orthonormal sequence. 


More generally, suppose [’ is a nonempty set. The Hilbert space Le #4), where 
y is counting measure on I, is often denoted by (*(I'). For k € I, define a 


function e,: T — F by 
; 1 ifj=k, 
ex(j) = 


0 iff Ak. 
Then {e; }zer is an orthonormal family in (7(T). 
e Fork € Z, define eg: (—7, 1] > R by 
1_sin(kt) ifk >0, 


se 
ex(t) = § see if k = 0, 
a cos(kt) ifk <0. 


Then {e;},¢z is an orthonormal family in L?((—7z, 71]), as you should verify 
(see Exercise | for useful formulas that will help with this verification). 


This orthonormal family {e,},<z leads to the classical theory of Fourier series, 
as we will see in more depth in Chapter 11. 
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e For k a nonnegative integer, define ex: [0,1) + F by 


1 ifx € ES ar) for some odd integer n, 


—1 ifxe (se, x) for some even integer n. 


The figure below shows the graphs 
of €9, €1, €2, and e3. The pattern of 
these graphs should convince you that 
{ek }ke{o,1,...} 8 an orthonormal fam- 


ily in L?((0,1)). 


This orthonormal family was 
invented by Hans Rademacher 
(1892-1969). 


=] -1 
The graph of eo. The graph of e}. The graph of e2. The graph of e3. 


e Now we modify the example in the previous bullet point by translating the 
functions in the previous bullet point by arbitrary integers. Specifically, for k a 
nonnegative integer and m € Z, define ej: R — F by 


1 ifxe([m nom sr) for some odd integer n € (1, 2), 
Cxm(X)= 4-1 ifxe [m nym xr) for some even integer n € [1,2*], 


0 ifx ¢ [m,m+1). 


Then {2 m }(km)€{0,1,...} xz 18 an orthonormal family in L?(R). 


This example illustrates the usefulness of considering families that are not 
sequences. Although {0,1,...} x Z is a countable set and hence we could 
rewrite {€km} (km)e {0,1,...}xz a8 a sequence, doing so would be awkward and 
would be less clean than the e;,,,, notation. 


The next result gives our first indication of why orthonormal families are so useful. 


8.52 finite orthonormal families 


Suppose 2 is a finite set and {e;} j<q is an orthonormal family in an inner product 
space. Then 


2 
| ei| = ela? 
jeO jO 


for every family {aj} jeq in F. 
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Proof Suppose {aj} jeQ is a family in F. Standard properties of inner products 
show that 


| O aej| = (ae Dane) 
jeo 


jeO keQ 


de ajmE(e;, ex) 


jkeO 


= Vila’, 


jeO 


as desired. 


Suppose (0 is a finite set and {e it jeq is an orthonormal family in an inner product 
space. The result above implies that if )ijcq «je; = 0, then a; = 0 for every j € . 

Linear algebra, and algebra more generally, deals with sums of only finitely many 
terms. However, in analysis we often want to sum infinitely many terms. For example, 
earlier we defined the infinite sum of a sequence 2), 22,... in a normed vector space 
to be the limit as n — oo of the partial sums )°/_, g; if that limit exists (see 6.40). 

The next definition captures a more powerful method of dealing with infinite sums. 
The sum defined below is called an unordered sum because the set I is not assumed 
to come with any ordering. A finite unordered sum is defined in the obvious way. 


8.53 Definition wnordered sum; Yi pcr fr 


Suppose { f,}ker is a family in a normed vector space V. The unordered sum 
Ver fk is said to converge if there exists g € V such that for every ¢ > 0, there 
exists a finite subset O of I such that 


le- Dail <¢ 


jen! 


for all finite sets OQ! with OQ Cc 0! CT. If this happens, we set Yer fe = g. If 
there is no such g € V, then yer fx is left undefined. 


Exercises at the end of this section ask you to develop basic properties of unordered 
sums, including the following: 


e Suppose {a;,},er is a family in R and a, > 0 for each k € I’. Then the unordered 
sum ) cep 4, converges if and only if 


sup{ y, a; : Q is a finite subset of Tf <0, 
je 


Furthermore, if )";cp a, converges then it equals the supremum above. If 
Veer 4 does not converge, then the supremum above is oo and we write 
a; = 00 (this notation should be used only when a; > 0 for each k € T). 

keT 4k y k 
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e Suppose {a,}zcr is a family in R. Then the unordered sum );ep ag converges 
if and only if Y,er|az| < oo. Thus convergence of an unordered summation in 
R is the same as absolute convergence. As we are about to see, the situation in 
more general Hilbert spaces is quite different. 


Now we can extend 8.52 to infinite sums. 


8.54 linear combinations of an orthonormal family 


Suppose {e;},cr is an orthonormal family in a Hilbert space V. Suppose {ax} ker 
is a family in F. Then 


(a) the unordered sum EB ape, converges <=> is |az|* < 00. 


keT kel 


Furthermore, if )",cp &e, converges, then 


(b) 


Proof First suppose )(,cp &ex converges, with )°pcp apex = g. Suppose € > 0. 
Then there exists a finite set Q C T such that 


lz- » ae; <e 


jen! 


for all finite sets OQ’ withO Cc O’ CT. If’ isa finite set with OQ CO’ CT, then 
the inequality above implies that 


Igl-e< [|X ell <Iigi+e 
jea! 
which (using 8.52) implies that 


1/2 
Iisl-e< (Dla?) <lisi +e 
jEQ! 


1/2 
Thus ||¢|| = (Lkerlel”) 


proof of (b). 

To prove the other direction of (a), now suppose Yer | a7 < oo. Thus there 
exists an increasing sequence Q, C Op C --- of finite subsets of I such that for 
each m € Zt, 


, completing the proof of one direction of (a) and the 


1 

2 

8.55 lal? < a 

for every finite set OQ! such that OQ, C QO’! CT. For each m € Z*, let 


Sm = LY, aye}. 
JEOm 
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Ifn > m, then 8.52 implies that 


1 
Digs 2 
ln — ml" = > |@;| < ae 
j€On\Qn 
Thus 91, 92,... is a Cauchy sequence and hence converges to some element g of V. 


Temporarily fixing m € Z* and taking the limit of the equation above as n — 0, 
we see that 


1 
lly —gml| < —. 


To show that )pep apex = g, suppose € > 0. Let m € Zt be such that 2 <é 
Suppose ( is a finite set with QO, C O’ CT. Then 


IA 


IIS — &ml| + | gm = a,e;| 


|s- ¥ well 
jen! je! 


IA 
| 
+ 


where the third line comes from 8.52 and the last line comes from 8.55. Thus 
Ver “rex = g, completing the proof. 


8.56 Example a convergent unordered sum need not converge absolutely 


Suppose {e,},<-7+ is the orthogonal family in (? defined by setting e, equal to 
the sequence that is 0 everywhere except for a 1 in the k™ slot. Then by 8.54, the 
unordered sum 

1 
de Rk 


keZzt 


converges in (7 (because )pez+ z < 00) even though )y<z+|| Lex|| = oo. Note 
that Pyez+ per = (1,4,4,--.) € &. 


Now we prove an important inequality. 


8.57 Bessel’s inequality 


Suppose {e,},er is an orthonormal family in an inner product space V and f € V. 


Then 


Lil (feel? < Ill. 
keT 
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Proof Suppose (1 is a finite subset of I. 


Then Bessel’s inequality is named in 


honor of Friedrich Bessel 
= Ay, _ \o. (1784-1846), who discovered this 
i= Li ene) at (f LY ei) , inequality in 1828 in the special 
case of the trigonometric 
where the first sum above is orthogonal | orthonormal family given by the 
to the term in parentheses above (as you \ third bullet point in Example 8.51. 
should verify). 
Applying the Pythagorean Theorem (8.9) to the equation above gives 


4 2 
WAP =| Eenel +f Ereedel 


2 
2 |La| 


jeO 


= Vile 


jeo 


where the last equality follows from 8.52. Because the inequality above holds for 
every finite set C I’, we conclude that || f ||? > Dyer|(f,ex)|?, as desired. 


Recall that the span of a family {e,},cr in a vector space is the set of finite sums 
of the form 
LM 
jeO 
where () is a finite subset of T and {ajticn is a family in F (see 6.54). Bessel’s 


inequality now allows us to prove the following beautiful result showing that the 
closure of the span of an orthonormal family is a set of infinite sums. 


8.58 closure of the span of an orthonormal family 


Suppose {e;}xer is an orthonormal family in a Hilbert space V. Then 
(a) span {ex }per = DE apex: {ex}ker is a family in F and Y Jax|? < oo} 
kel kel 


Furthermore, 


(b) f= ie 


keT 


for every f € span {ex }er. 


Proof The right side of (a) above makes sense because of 8.54(a). Furthermore, the 
right side of (a) above is a subspace of V because ¢7(I’) [which equals £(j:), where 
# is counting measure on I] is closed under addition and scalar multiplication by 7.5. 


Section 8C Orthonormal Bases 243 


Suppose first {a;},er is a family in F and Yyer|ayz|2 < co. Let e > 0. Then 
there is a finite subset © of TI such that 


y: |a;| ae. 


jer\Q 


The inequality above and 8.54(b) imply that 


| y ae), ae; <6. 
jeQ 


keT 


The definition of the closure (see 6.7) now implies that Piper ape, € span {ex} Ker, 
showing that the right side of (a) is contained in the left side of (a). 
To prove the inclusion in the other direction, now suppose f € span {ex }xer. Let 


8.59 ga) Leder 


keT 


where the sum above converges by Bessel’s inequality (8.57) and by 8.54(a). The 
direction of the inclusion that we just proved implies that g € span {ex },cr. Thus 


8.60 g—f © span {ex} cer- 


Equation 8.59 implies that (g,e;) = (f,e;) for each j € I, as you should verify 
(which will require using the Cauchy—Schwarz inequality if done rigorously). Hence 


(g—f,e,) =0 foreveryk eT. 


This implies that 


a eS (span{e;}jer)~ = (span{ej}jer) 


where the equality above comes from 8.40(d). Now 8.60 and the inclusion above 
imply that f = g [see 8.40(b)], which along with 8.59 implies that f is in the right 
side of (a), completing the proof of (a). 

The equations f = g and 8.59 also imply (b). 


Parseval’s Identity 


Note that 8.52 implies that every orthonormal family in an inner product space is 
linearly independent (see 6.54 to review the definition of linearly independent and 
basis). Linear algebra deals mainly with finite-dimensional vector spaces, but infinite- 
dimensional vector spaces frequently appear in analysis. The notion of a basis is not 
so useful when doing analysis with infinite-dimensional vector spaces because the 
definition of span does not take advantage of the possibility of summing an infinite 
number of elements. 

However, 8.58 tells us that taking the closure of the span of an orthonormal 
family can capture the sum of infinitely many elements. Thus we make the following 
definition. 
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8.61 Definition orthonormal basis 


An orthonormal family {e,},<r in a Hilbert space V is called an orthonormal 
basis of V if 


span {ex}cer = V. 


In addition to requiring orthonormality (which implies linear independence), the 
definition above differs from the definition of a basis by considering the closure of 
the span rather than the span. An important point to keep in mind is that despite the 
terminology, an orthonormal basis is not necessarily a basis in the sense of 6.54. In 
fact, if T is an infinite set and {e,},cr is an orthonormal basis of V, then {ex} xer is 
not a basis of V (see Exercise 9). 


8.62 Example orthonormal bases 


e Forn € Zt andk € {1,...,m}, let e, be the element of F” all of whose 
coordinates are 0 except the k"" coordinate, which is 1: 


ex = (0,...,0,1,0,...,0). 
Then {ek }ke{1,...n} is an orthonormal basis of F”. 
_). Then 


V3’ V3" A) eo ( V2’ 33/0), and es (3 Je’ 7) 
e is an orthonormal basis of FE, as you should verify. 
kSkE{1,2,3} y y: 


e Let e, = ( 


e The first three bullet points in 8.51 are examples of orthonormal families that are 
orthonormal bases. The exercises ask you to verify that we have an orthonormal 
basis in the first and second bullet points of 8.51. For the third bullet point 
(trigonometric functions), see Exercise 11 in Section 10D or see Chapter 11. 


The next result shows why orthonormal bases are so useful—a Hilbert space with 
an orthonormal basis {ex },er behaves like (7 (I). 


8.63 Parseval’s identity 


Suppose {e,}xer is an orthonormal basis of a Hilbert space Vand f,g € V. Then 


(a) f = Di (freer: 


keT 


(6) (fs) = oer) een): 


keT 


(e) [Ifll? = DilKf-ex)I?. 


keT 


Proof The equation in (a) follows immediately from 8.58(b) and the definition of an 
orthonormal basis. 
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To prove (b), note that 


Equation (c) is called Parseval’s 


identity in honor of Marc-Antoine 
aE os ( dX (fi, ek)ekr 8 ) Parseval (1755-1836), who 
er discovered a special case in 1799. 


=) fee) (eed) 


keT 


= Vth ex) (ge), 


keT 


where the first equation follows from (a) and the second equation follows from the 
definition of an unordered sum and the Cauchy—Schwarz inequality. 
Equation (c) follows from setting g = f in (b). An alternative proof: equation (c) 
follows from 8.54(b) and the equation f = )° (f,e,)e, from (a). 
keT 


Gram—Schmidt Process and Existence of Orthonormal Bases 


8.64 Definition separable 


A normed vector space is called separable if it has a countable subset whose 
closure equals the whole space. 


8.65 Example separable normed vector spaces 


e Suppose n € Z*. Then F” with the usual Hilbert space norm is separable 
because the closure of the countable set 


{(C1,-++/€n) € F" : each c; is rational } 


equals F” (in case F = C: to say that a complex number is rational in this 
context means that both the real and imaginary parts of the complex number are 
rational numbers in the usual sense). 


e The Hilbert space (7 is separable because the closure of the countable set 
U {(c1,.--,€n,0,0,...) € 0: each c; is rational } 
n=1 


is C2. 


e The Hilbert spaces L*((0,1]) and L?(R) are separable, as Exercise 13 asks you 
to verify [hint: consider finite linear combinations with rational coefficients of 


functions of the form x (cd) where c and d are rational numbers]. 
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A moment’s thought about the definition of closure (see 6.7) shows that a normed 
vector space V is separable if and only if there exists a countable subset C of V such 
that every open ball in V contains at least one element of C. 


8.66 Example nonseparable normed vector spaces 


e Suppose I is an uncountable set. Then the Hilbert space ¢2 (T) is not separable. 
To see this, note that IX - Xgl = ¥2 for all j,k € T with j 4k. Hence 


{Blxgy 2) sk ET} 


is an uncountable collection of disjoint open balls in (LT); no countable set can 
have at least one element in each of these balls. 


e The Banach space L®((0,1]) is not separable. Here IIX~o a Keg || = 1 for all 
s,t € [0,1] withs # t. Thus 


1 
{B(x 4) :# € (1) 
is an uncountable collection of disjoint open balls in L®((0,1]). 


We present two proofs of the existence of orthonormal bases of Hilbert spaces. 
The first proof works only for separable Hilbert spaces, but it gives a useful algorithm, 
called the Gram—Schmidt process, for constructing orthonormal sequences. The 
second proof works for all Hilbert spaces, but it uses a result that depends upon the 
Axiom of Choice. 

Which proof should you read? In practice, the Hilbert spaces you will encounter 
will almost certainly be separable. Thus the first proof suffices, and it has the 
additional benefit of introducing you to a widely used algorithm. The second proof 
uses an entirely different approach and has the advantage of applying to separable 
and nonseparable Hilbert spaces. For maximum learning, read both proofs! 


8.67 existence of orthonormal bases for separable Hilbert spaces 


Every separable Hilbert space has an orthonormal basis. 


Proof Suppose V is a separable Hilbert space and { f1, fo,...} is a countable subset 
of V whose closure equals V. We will inductively define an orthonormal sequence 
{ex}cez+ such that 


8.68 span{ f1,...,fn} C span{er,...,en} 


for each n € Z*. This will imply that span{e;},ez+ = V, which will mean that 
{ex}kez+ is an orthonormal basis of V. 
To get started with the induction, set e; = f;/|| f;|| (we can assume that f; 4 0). 
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Now suppose n € Z* and e1,...,én have been chosen so that {e.}ke{1,..,n} 18 
an orthonormal family in V and 8.68 holds. If f, € span{e1,...,en} for every 
k € Z*, then {ep }pe {1,...n} 18 an orthonormal basis of V (completing the proof) and 
the process should be stopped. Otherwise, let m be the smallest positive integer such 
that 


8.69 fim & span{ey,...,en}. 
Define e;,41 by 


a fin — (fins €1)e1 — +++ — (fms €nyen 
- Il fin = (fins €1)e1 pa mv ny@nll 


Clearly |le,+1|| = 1 (8.69 guaran- 
tees there is no division by 0). If 
k € {1,...,n}, then the equation above 
implies that (e,41,e,) = 0. Thus 
{ek}ke{1,...n+1} i8 an orthonormal fam- 
ily in V. Also, 8.68 and the choice of m 
as the smallest positive integer satisfying 
8.69 imply that 


8.70 


Jorgen Gram (1850-1916) and 
Erhard Schmidt (1876-1959) 


popularized this process that 
constructs orthonormal sequences. 


span{fi,..-,fnasi} C span{es,...,en41}, 
completing the induction and completing the proof. 
Before considering nonseparable Hilbert spaces, we take a short detour to illustrate 
how the Gram—Schmidt process used in the previous proof can be used to find closest 


elements to subspaces. We begin with a result connecting the orthogonal projection 
onto a closed subspace with an orthonormal basis of that subspace. 


8.71 orthogonal projection in terms of an orthonormal basis 


Suppose that U is a closed subspace of a Hilbert space Vand {e;,};er is an 
orthonormal basis of U. Then 


Puf = ) (fr ex)ex 


keT 


for all f € V. 


Proof Let f € V. Ifk ET, then 


8.72 (frek) = (f — Pufs ek) + (Pufs ex) = (Puf, ek)» 
where the last equality follows from 8.37(a). Now 
Puf = )(Puf,erser = Do (frerver 
ke. ke 


where the first equality follows from Parseval’s identity [8.63(a)] as applied to U and 
its orthonormal basis {e,},er, and the second equality follows from 8.72. 
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8.73 Example best approximation 


Find the polynomial g of degree at most 10 that minimizes 


[|Vel-se@) «x 


Solution We will work in the real Hilbert space L*([—1, 1]) with the usual inner 
product (g,h) = ie gh. Fork € {0,1,...,10}, let f, € L?([-1,1]) be defined by 
f(x) = x*. Let U be the subspace of L?({—1,1]) defined by 


U = span{ fx }ke{o,...,10}- 


Apply the Gram-Schmidt process from the proof of 8.67 to {fk }ke {0,...,10}> Pro- 
ducing an orthonormal basis {e;};¢ {o,...10} Of U, which is a closed subspace of 
L?([—1,1]) (see Exercise 8). The point here is that {ex };<fo, .., 19} can be computed 
explicitly and exactly by using 8.70 and evaluating some integrals (using software that 
can do exact rational arithmetic will make the process easier), getting e9(x) = 1 /V2, 


e1(x) = V6x/2,... up to 


42 
e19(x) = = (—63 + 3465x7 — 30030x* + 90090x° — 109395x° + 46189x!°), 


Define f € L?({[—1,1]) by f(x) = ,/|x]. Because U is the subspace of 
L*([-1,1]) consisting of polynomials of degree at most 10 and Pf equals the 
element of U closest to f (see 8.34), the formula in 8.71 tells us that the solution ¢ to 
our minimization problem is given by the formula 


g= Di (frerer- 


10 
k=0 


Using the explicit expressions for eg,...,€19 and again evaluating some integrals, 
this gives 


693 + 15015x2 — 64350x* + 139230x° — 138567x° + 51051x19 
g(x) = 2944 


The figure here shows the graph of 1+ 
f(x) = \/]x| (red) and the graph of 
its closest polynomial g (blue) of de- 
gree at most 10; here closest means as 
measured in the norm of L?({—1,1]). 

The approximation of f by g is 
pretty good, especially considering 
that f is not differentiable at 0 and thus 
a Taylor series expansion for f does 4 
not make sense. 
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Recall that a subset I of a set V can be thought of as a family in V by considering 
{e fai fer» Where er = f. With this convention, a subset I of an inner product space 
V is an orthonormal subset of V if ||f|| = 1 for all f € T and (f,¢) = 0 for all 
fg €T with f € g. 

The next result characterizes the orthonormal bases as the maximal elements 
among the collection of orthonormal subsets of a Hilbert space. Recall that a set 
T € Ainacollection of subsets of a set V is a maximal element of A if there does 
not exist I’ € A such that PG I” (see 6.55). 


8.74 orthonormal bases as maximal elements 


Suppose V is a Hilbert space, A is the collection of all orthonormal subsets of V, 


and I is an orthonormal subset of V. Then I is an orthonormal basis of V if and 
only if I is a maximal element of A. 


Proof First suppose T is an orthonormal basis of V. Parseval’s identity [8.63(a)] 
implies that the only element of V that is orthogonal to every element of I is 0. Thus 
there does not exist an orthonormal subset of V that strictly contains T’. In other 
words, I is a maximal element of A. 
To prove the other direction, suppose now that T is a maximal element of A. Let 
U denote the span of . Then 
iu S40} 


because if f is a nonzero element of us then IU {f/||f||} is an orthonormal subset 
of V that strictly contains T. Hence U = V (by 8.42), which implies that Tis an 
orthonormal basis of V. 


Now we are ready to prove that every Hilbert space has an orthonormal basis. 
Before reading the next proof, you may want to review the definition of a chain (6.58), 
which is a collection of sets such that for each pair of sets in the collection, one of 
them is contained in the other. You should also review Zorn’s Lemma (6.60), which 
gives a way to show that a collection of sets contains a maximal element. 


8.75 existence of orthonormal bases for all Hilbert spaces 


Every Hilbert space has an orthonormal basis. 


Proof Suppose V is a Hilbert space. Let A be the collection of all orthonormal 
subsets of V. Suppose C C A is a chain. Let L be the union of all the sets in C. If 
f €L, then ||f|| = 1 because f is an element of some orthonormal subset of V that 
is contained in C. 

If f,g € Lwith f ¥ g, then there exist orthonormal subsets O and T in C such 
that f € O and g € T. Because C is a chain, either OQ C I orI C ©. Either way, 
there is an orthonormal subset of V that contains both f and g. Thus (f,¢) = 0. 

We have shown that L is an orthonormal subset of V; in other words, L € A. 
Thus Zorn’s Lemma (6.60) implies that A has a maximal element. Now 8.74 implies 
that V has an orthonormal basis. 


250 Chapter 8 Hilbert Spaces 


Riesz Representation Theorem, Revisited 


Now that we know that every Hilbert space has an orthonormal basis, we can give a 
completely different proof of the Riesz Representation Theorem (8.47) than the proof 
we gave earlier. 

Note that the new proof below of the Riesz Representation Theorem gives the 
formula 8.77 for h in terms of an orthonormal basis. One interesting feature of this 
formula is that h is uniquely determined by ¢ and thus /: does not depend upon the 
choice of an orthonormal basis. Hence despite its appearance, the right side of 8.77 
is independent of the choice of an orthonormal basis. 


8.76 Riesz Representation Theorem 


Suppose @ is a bounded linear functional on a Hilbert space V and {e, }xer is an 
orthonormal basis of V. Let 


8.77 h=Y° p(ex)ek. 
keT 


Then 
8.78 p(f) = (f,h) 


for all f € V. Furthermore, || g|| = (Ckerl ¢(ex)| 


Be 


Proof First we must show that the sum defining h makes sense. To do this, suppose 
is a finite subset of I. Then 
1/2 


Vlei? = e( X eee) < llell |X eel] = lell(LleeyP) 
jEO FEO. jEO jEO 


1/2 
where the last equality follows from 8.52. Dividing by (Zien |p(e;) P) gives 


2 1/2 
(Clee?) < llell 
jeO 
Because the inequality above holds for every finite subset of I’, we conclude that 


Yl ele)? < llell. 
ke. 


Thus the sum defining 4 makes sense (by 8.54) in equation 8.77. 
Now 8.77 shows that (h,e;) = g(e;) for each j € I. Thus if f € V then 


9(f) = o( Lif, ender) = Do(fer) pler) = L (Free)(eer) = (fh), 


keT keT 


where the first and last equalities follow from 8.63 and the second equality follows 
from the boundedness/continuity of g. Thus 8.78 holds. 
Finally, the Cauchy—Schwarz inequality, equation 8.78, and the equation g(h) = 


1/2 
(h,h) show that || p|| = |||| = (eerlp(er)|?) “~ 
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EXERCISES 8C 


1 Verify that the family {e,},¢z as defined in the third bullet point of Example 
8.51 is an orthonormal family in L? ((- 7, 7t]). The following formulas should 
help: 


sin(x + y) + sin(x — y) 


(sin x)(cosy) = 5 ; 
__ cos(x — y) — cos(x + y) 
(sin x) (sin y) 5 ‘ 
ieee = cos(x + y) Fcos(a — y) 


2 Suppose {ax}er is a family in R and a, > 0 for each k € T. Prove the 
unordered sum ) cp a, converges if and only if 


sup{ > a; : Q.isa finite subset of Tf < 0, 
jeO 
Furthermore, prove that if )",cp a; converges then it equals the supremum above. 


3 Suppose {e;},er is an orthonormal family in an inner product space V. Prove 
that if f € V, then {k ET: (f,e,) #0} is a countable set. 


4 Suppose {f,}xer and {¢%}xep are families in a normed vector space such that 
Veer fr and Ver gx converge. Prove that Pper (fe + 2%) converges and 


esd) = At oe 


keT keT keT 


5 Suppose { f,}x,er is a family in a normed vector space such that ) er ff, con- 
verges. Prove that if c € F, then );er(cf,) converges and 


ih =e) te 


keT keT 


6 Suppose {a,},er is a family in R. Prove that the unordered sum ) yep ax 
converges if and only if )yer|axz| < 9. 


7 Suppose {f,},ez+ is a family in a normed vector space. Prove that the un- 
ordered sum )°,¢7+ f; converges if and only if the usual ordered sum )-7°_; folk) 
converges for every injective function p: Zt > Zt. 


8 Explain why 8.58 implies that if T is a finite set and {e;, };eyr is an orthonormal 
family in a Hilbert space V, then span{e,},<r is a closed subspace of V. 


9 Suppose V is an infinite-dimensional Hilbert space. Prove that there does not 
exist a basis of V that is an orthonormal family. 
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10 


11 


12 


13 


14 
15 


16 


17 


18 


Chapter 8 Hilbert Spaces 


(a) Show that the orthonormal family given in the first bullet point of Exam- 
ple 8.51 is an orthonormal basis of 07. 


(b) Show that the orthonormal family given in the second bullet point of Exam- 
ple 8.51 is an orthonormal basis of ¢7(T). 


(c) Show that the orthonormal family given in the fourth bullet point of Exam- 
ple 8.51 is not an orthonormal basis of L? ({0,1)). 


(d) Show that the orthonormal family given in the fifth bullet point of Exam- 
ple 8.51 is not an orthonormal basis of L?(R). 


Suppose // is a -finite measure on (X,S) and v is a o-finite measure on (Y, 7 ). 
Suppose also that {e;}j¢q is an orthonormal basis of L?(p) and { fx }eer is an 


orthonormal basis of L?(v). For j € OQ andk € T, define Bini X X Y + F by 


Sil y) = ej(X) fely)- 
Prove that {gj.«}jeo,ker is an orthonormal basis of L7H XV). 


Prove the converse of Parseval’s identity. More specifically, prove that if {e,},er 
is an orthonormal family in a Hilbert space V and 


fll? = Vf en)? 


keT 
for every f € V, then {ex }ep is an orthonormal basis of V. 


(a) Show that the Hilbert space L?((0,1]) is separable. 
(b) Show that the Hilbert space L*(R) is separable. 


(c) Show that the Banach space °° is not separable. 
Prove that every subspace of a separable normed vector space is separable. 


Suppose V is an infinite-dimensional Hilbert space. Prove that there does not 
exist a translation invariant measure on the Borel subsets of V that assigns 
positive but finite measure to each open ball in V. 

[A subset of V is called a Borel set if it is in the smallest o-algebra containing 
all the open subsets of V. A measure jt on the Borel subsets of V is called 
translation invariant if w(f + E) = u(E) for every f € V and every Borel set 
E of V] 


1 
Find the polynomial g of degree at most 4 that minimizes | ie — g(x) |? dx. 
0 


Prove that each orthonormal family in a Hilbert space can be extended to 
an orthonormal basis of the Hilbert space. Specifically, suppose {ej} je iS 
an orthonormal family in a Hilbert space V. Prove that there exists a set T 
containing Q and an orthonormal basis { f, }ker of V such that fj = e; for every 
pea. 


Prove that every vector space has a basis. 


19 
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Find the polynomial g of degree at most 4 such that 


fa) = [fs 


for every polynomial f of degree at most 4. 


Exercises 20-25 are for readers familiar with analytic functions. 


20 


21 


22 


Suppose G is a nonempty open subset of C. The Bergman space L2(G) is 
defined to be the set of analytic functions f : G —> C such that 


in If[2 dar < 00, 


where A is the usual Lebesgue measure on R?, which is identified with C. For 
f,h € L2(G), define (f,h) to be fa fh dag. 


(a) Show that L2(G) is a Hilbert space. 


(b) Show that if w € G, then f ++ f(w) is a bounded linear functional on 
L7(G). 


Let D denote the open unit disk in C; thus 
Daj {26 C:\z2| <1}. 


(a) Find an orthonormal basis of L3(D). 
(b) Suppose f € L2(D) has Taylor series 


fe) = Vay 
k=0 


for z € D. Find a formula for || f|| in terms of a9,41,a2,.... 


(c) Suppose w € D. By the previous exercise and the Riesz Representation 
Theorem (8.47 and 8.76), there exists T, € L2(D) such that 


f(w) = (f,Tw) for all f € L?(D). 
Find an explicit formula for Tj. 
Suppose G is the annulus defined by 
Ga126 Cri |Z <3}. 


(a) Find an orthonormal basis of L?(G). 
(b) Suppose f € L2(G) has Laurent series 


f= Yo ae! 


k=—oo 


for z € G. Find a formula for || f|| in terms of ...,a_1,49,41,.... 
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23 Prove that if f € L2(D \ {0}), then f has a removable singularity at 0 (meaning 
that f can be extended to a function that is analytic on D). 


24 The Dirichlet space D is defined to be the set of analytic functions f: D > C 
such that 


I If'I2 daz < 00, 


For f,g € D, define (f,) to be f(0)9(0) + Jy f’ 9” daz. 

(a) Show that D is a Hilbert space. 

(b) Show that if w € D, then f +» f(w) is a bounded linear functional on D. 
(c) Find an orthonormal basis of D. 


(d) Suppose f € D has Taylor series 
f(z) = Lo az* 
k=0 


for z € D. Find a formula for || f|| in terms of ag, 41,a,.... 


(e) Suppose w € D. Find an explicit formula for T, € D such that 


f(w) = (f, Tw) forall f € D. 


25 (a) Prove that the Dirichlet space D is contained in the Bergman space E(D). 


(b) Prove that there exists a function f € L?(D) such that f is uniformly 
continuous on D and f ¢ D. 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 
4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial 
use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give 
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license 
and indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter’s Creative 
Commons license, unless indicated otherwise in a credit line to the material. If material is not included 
in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation 
or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. 


— 


Check for 
updates 


Chapter 9 


A measure is a countably additive function from a o-algebra to (0, oo]. In this chapter, 
we consider countably additive functions from a g-algebra to either R or C. The first 
section of this chapter shows that these functions, called real measures or complex 
measures, form an interesting Banach space with an appropriate norm. 

The second section of this chapter focuses on decomposition theorems that help 
us understand real and complex measures. These results will lead to a proof that the 
dual space of L? (1) can be identified with L?’(j:). 


Dome in the main building of the University of Vienna, where Johann Radon 
(1887-1956) was a student and then later a faculty member. The Radon—Nikodym 
Theorem, which will be proved in this chapter using Hilbert space techniques, 
provides information analogous to differentiation for measures. 
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9A Total Variation 


Properties of Real and Complex Measures 


Recall that a measurable space is a pair (X,S), where S is a v-algebra on X. Recall 
also that a measure on (X,S) is a countably additive function from S to [0,09] that 
takes © to 0. Countably additive functions that take values in R or C give us new 
objects called real measures or complex measures. 


9.1 Definition real and complex measures 


Suppose (X,S) is measurable space. 


e A function v: S > Fis called countably additive if 


¥(U) Ex) - Xu v(Ex) 


for every disjoint sequence Ej, Eo,... of sets in S. 
e A real measure on (X,S) is a countably additive function v: S > R. 


e A complex measure on (X,S) is a countably additive function v: S > C. 


The word measure can be ambiguous 
in the mathematical literature. The most 
common use of the word measure is as 
we defined it in Chapter 2 (see 2.54). 
However, some mathematicians use the 
word measure to include what are here 
called real and complex measures; they 
then use the phrase positive measure to 
refer to what we defined as a measure in 
2.54. To help relieve this ambiguity, in this chapter we usually use the phrase 
(positive) measure to refer to measures as defined in 2.54. Putting positive in paren- 
theses helps reinforce the idea that it is optional while distinguishing such measures 
from real and complex measures. 


The terminology nonnegative 
measure would be more appropriate 
than positive measure because the 
function 1: S — F defined by 


H(E) = Ofor every E € Sisa 
positive measure. However, we will 
stick with tradition and use the 
phrase positive measure. 


9.2 Example = real and complex measures 


e Let A denote Lebesgue measure on [—1,1]. Define v on the Borel subsets of 
[-1, 1] by 
v(E) = A(EN([0,1]) —A(EN[-1,0)). 


Then v is a real measure. 


e If y1 and plz are finite (positive) measures, then f/; — j/2 is a real measure and 
&1 41 + X22 is a complex measure for all a1,a2 € C. 


e If vis a complex measure, then Re v and Im v are real measures. 
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Note that every real measure is a complex measure. Note also that by definition, 
oo is not an allowable value for a real or complex measure. Thus a (positive) measure 
pon (X,S) is a real measure if and only if p(X) < oo. 

Some authors use the terminology signed measure instead of real measure; some 
authors allow a real measure to take on the value co or —oo (but not both, because the 
expression co — oo must be avoided). However, real measures as defined here serve 
us better because we need to avoid +00 when considering the Banach space of real 
or complex measures on a measurable space (see 9.18). 

For (positive) measures, we had to make (©) = 0 part of the definition to avoid 
the function py that assigns ©o to all sets, including the empty set. But oo is not an 
allowable value for real or complex measures. Thus v(®) = 0 is a consequence of 
our definition rather than part of the definition, as shown in the next result. 


9.3 absolute convergence for a disjoint union 


Suppose v is a complex measure on a measurable space (X,S). Then 


(a) v(@) =0; 


[o.e) 
(b) )°|v(Ex)| < 09 for every disjoint sequence Ey, Ez,... of sets in S. 
=I 


Proof To prove (a), note that @,@,... is a disjoint sequence of sets in S whose 
union equals ©. Thus 


The right side of the equation above makes sense as an element of R or C only when 
v(®) = 0, which proves (a). 

To prove (b), suppose Ej, Eo,... is a disjoint sequence of sets in S. First suppose 
v is a real measure. Thus 


( U B= DL = LE Wee 


{k:v(E,)>0} {k:v(E,)>0} {k:v(E,)>0} 


and 


-( U B)=-D w= YL Wek 


{k:v(E,) <0} {k:v(Ex) <0} {k:v(E,) <0} 


Because v(E) € R for every E € S, the right side of the last two displayed equations 
is finite. Thus )°° ,|v(E;)| < 09, as desired. 
Now consider the case where v is a complex measure. Then 


3 


F Iv(B < F ((Rev)(Ee| + |v) (EN) < oe 


where the last inequality follows from applying the result for real measures to the 
real measures Re v and Im v. 
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The next definition provides an important class of examples of real and complex 
measures. 


9.4 measure determined by an C'-function 


Suppose / is a (positive) measure on a measurable space (X,S) andh € L1(y). 
Define v: S + F by 


= / hduy. 
[han 


Then v is a real measure on (X,S) if F = R and is a complex measure on (X,S) 
ie 1 == (C, 


Proof Suppose Ej, E2,... is a disjoint sequence of sets in S. Then 


95 (UE) = [(S xa(-)) dua) =F f xghdn =F oF. 


where the first equality holds because the sets E;, E2,... are disjoint and the second 
equality follows from the inequality 


h(x)| <|H(x)|, 


which along with the assumption that h € £! (1) allows us to interchange the integral 
and limit of the partial sums by the Dominated Convergence Theorem (3.31). 
The countable additivity shown in 9.5 means v is a real or complex measure. 


The next definition simply gives a notation for the measure defined in the previous 


result. In the notation that we are about to define, the symbol d has no separate 
meaning—it functions to separate h and pi. 


9.6 Definition hdy 


Suppose 11 is a (positive) measure on a measurable space (X,S) andh € L1(p). 
Then h dy is the real or complex measure on (X,S) defined by 


(hdu)(E )= [ nap. 


Note that if a function h € £1(j) takes values in [0,00), then h dy is a finite 
(positive) measure. 

The next result shows some basic properties of complex measures. No proofs 
are given because the proofs are the same as the proofs of the corresponding results 
for (positive) measures. Specifically, see the proofs of 2.57, 2.61, 2.59, and 2.60. 
Because complex measures cannot take on the value oo, we do not need to worry 
about hypotheses of finite measure that are required of the (positive) measure versions 
of all but part (c). 
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9.7 properties of complex measures 


Suppose v is a complex measure on a measurable space (X,S). Then 


(a) v(E\ D) = v(E) —v(D) forall D,E € S with D C E; 


(yD WE =v Di we vi ie) forall, Ee S: 


Ex) = lim v(E 
(c) v(U k) = Jim v(Ex) 
[c= 
for all increasing sequences E, C Ep C --- of sets in S; 


v(() Ex) = lim v(E 
(1) Ee jim v(Ex) 
k= 
for all decreasing sequences E; > Ep > --- of sets in S. 


Total Variation Measure 


We use the terminology total variation measure below even though we have not 
yet shown that the object being defined is a measure. Soon we will justify this 
terminology (see 9.11). 


9.8 Definition total variation measure 


Suppose v is a complex measure on a measurable space (X,S). The total 
variation measure is the function |v|: S — [0,00] defined by 


|v|(E) = sup{|v(E,)| +--+ +|v(En)| 1m € Zt and Ej,...,En 


are disjoint sets in S such that Ey U--- UE, C Ey 


To start getting familiar with the definition above, you should verify that if v is a 
complex measure on (X,S) and E € S, then 


e |v(E)| < |v|(E); 
e |v|(E) = v(E) if v is a finite (positive) measure; 
e |v|(E) = 0 if and only if v(A) = 0 for every A € S such that A C E. 


The next result states that for real measures, we can consider only n = 2 in the 
definition of the total variation measure. 


9.9 total variation measure of a real measure 


Suppose v is a real measure on a measurable space (X,S) and E € S. Then 


\v|(E) = sup{|v(A)| +|v(B)| : A, B are disjoint sets in S and AUB C E}. 
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Proof Suppose that n € Z* and Ej,...,En are disjoint sets in S such that 
FE, U---UE, C E. Let 


A= U E, and B= U Ex. 
{k:v(E,)>0} {k:v(E,) <0} 


Then A, B are disjoint sets in S and AUB C E. Furthermore, 
|V(A)| + |v(B)| = |v(E1)| +--+ + [v(En)I- 


Thus in the supremum that defines |v|(E), we can take n = 2. 


The next result could be rephrased as stating that if h € £! (y), then the total 
variation measure of the measure h dy is the measure |h| dy. In the statement below, 
the notation dv = h du means the same as v = h dy; the notation dv is commonly 
used when considering expressions involving measures of the form h dy. 


9.10 total variation measure of h du 


Suppose j/ is a (positive) measure on a measurable space (X,S),h € L(y), 
and dv = h dy. Then 


IvI(E) = f Wel dy 


for every E € S. 


Proof Suppose that E € S. If Fy,...,E, is a disjoint sequence in S such that 
F, U---UE, C E, then 


n n n 
LE = Lf man] <0 fel dus [inl ay. 
k=1 k=1 7 Fe k=1 7 Ex E 


The inequality above implies that |v|(E) < [7,|h| dy. 
To prove the inequality in the other Pei first suppose F = R; thus his a 
real-valued function and v is a real measure. Let 


A={xeE:h(x)>0} and B={x€ E: h(x) <0}. 


Then A and B are disjoint sets in S and AU B C E. We have 


lv(A)| + |v(B)| = [ nau— fondu = | In\ap. 


Thus |v|(E) > J;|4| du, completing the proof in the case F = R. 

Now ae F = C; thus v is a complex measure. Let ¢ > 0. There exists a 
simple function g € £1(j) such that ||¢ —h||, < e (by 3.44). There exist disjoint 
sets Ey,...,E, € S and cy,...,Cy € C such that Ey U--- UE, C E and 


n 
gle = Lo ix, 
1 
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Now 
n 
YE = | [mau 
k=l k=1 7 Fe 
n 
> | [ sae|— | [ (ean! 
k=17 Fk k=17 Fk 
n nN 
= DileelH(Ex) — dy | (g- h) dy 
k=1 k=1 7 Fe 
nN 
=) 8| du — wi (g—h) dy 
E k=1'7 Ek 
n 
> [Ig du — » | Ig — hl dy 
. k=1” Ee 
> | |h| du —2e. 
> [In| ay 
The inequality above implies that Wee ) = f-|h| du — 2e. Because ¢ is an arbitrary 
positive number, this implies |v|(E) > rh| du, completing the proof. 


Now we justify the terminology total variation measure. 


9.11 total variation measure is a measure 


Suppose v is a complex measure on a measurable space (X,S). Then the total 
variation function |v| is a (positive) measure on (X,S). 


Proof The definition of |v| and 9.3(a) imply that |v|(@) = 0. 

To show that |v| is countably additive, suppose Aj, A2,... are disjoint sets in S. 
Fix m € Z*. For eachk € {1,...,m}, suppose E,;,..., Ey, ; are disjoint sets in S 
such that 


9.12 Exx U...U En k cS Ax. 


Then {Ejx :1<k<mand1 <j < nx} isa disjoint collection of sets in S that 
are all contained in U2, Az. Hence 


YEE Ejx)| < mi(U Ak): 


Taking the supremum of the left side of the inequality above over all choices of {E jt 
satisfying 9.12 shows that 


Livi(4s) < WI(U Ay): 


Because the inequality above holds for all m € Z*, we have 


Livi(4s) < WI(U Ay): 
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To prove the inequality above in the other direction, suppose E1,...,E, € S are 
disjoint sets such that Ey U---U Ey C Uf, Ag. Then 


¥li(Ay > 32 VB Ay) 
k=1 =1j= 


k=1j=1 


= YEN Ay 


2 v(E; M Ak) 


where the first line above follows from the definition of |v|(A,) and the last line 
above follows from the countable additivity of v. 
The inequality above and the definition of |v| (U2, Ax) imply that 


Lblav 2 WI(U 4x): 


completing the proof. 


The Banach Space of Measures 


In this subsection, we make the set of complex or real measures on a measurable 
space into a vector space and then into a Banach space. 


Suppose (X,S) is a measurable space. For complex measures v, jt on (X,S) 


and w € F, define complex measures v + p and wv on (X,S) by 


(v+p)(E) = v(E)+u(E) and (av)(E) = a(v(E)). 


You should verify that if v, yw, and w are as above, then v + y and av are complex 
measures on (X,S). You should also verify that these natural definitions of addition 
and scalar multiplication make the set of complex (or real) measures on a measurable 
space (X,S) into a vector space. We now introduce notation for this vector space. 


Suppose (X,S) is a measurable space. Then Mp(S) denotes the vector space 
of real measures on (X,S) if F = R and denotes the vector space of complex 
measures on (X,S) if F = C. 
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We use the terminology total variation norm below even though we have not yet 
shown that the object being defined is a norm (especially because it is not obvious 
that ||v’|| < co for every complex measure v). Soon we will justify this terminology. 


9.15 Definition ‘total variation norm of a complex measure 


Suppose v is a complex measure on a measurable space (X,S). The total 


variation norm of v, denoted ||v||, is defined by 


IIvll = |vl(X). 


9.16 Example total variation norm 
e If jis a finite (positive) measure, then ||j:|| = 4(X), as you should verify. 


e If jis a (positive) measure, h € £L'(p), and dv = hdy, then ||v|| = ||/|\1 (as 
follows from 9.10). 


The next result implies that if v is a complex measure on a measurable space 
(X,S), then |v|(E) < 00 for every E€ S. 


9.17 total variation norm is finite 


Suppose (X,S) is a measurable space and v € M,(S). Then ||v|| < oo. 


Proof First consider the case where F = R. Thus v is a real measure on (X,S). To 
begin this proof by contradiction, suppose ||v|| = |v|(X) = ov. 

We inductively choose a decreasing sequence Ey D Ey D E7 D--- of setsinS 
as follows: Start by choosing Eg = X. Now suppose n > 0 and E,, € S has been 
chosen with |v|(E,) = co and |v(E;)| > n. Because |v|(E,) = 00, 9.9 implies that 
there exists A € S such that A C Ey and |v(A)| > n+1+ |v(E,)|, which implies 
that 

Iv(En \ A)| = [v(En) — v(A)| > [w(A)] — [v(En)| > 0 $1. 
Now 
|v|(A) + |v|(En \ A) = |v|(En) = 00 
because the total variation measure |v| is a (positive) measure (by 9.11). The equation 
above shows that at least one of |v|(A) and |v|(E, \ A) is oo. Let E,4; =A 
if |v|(A) = oo and let E,4,; = E,\A if |v|(A) < oo Thus Ey D E,44, 
\v|(En41) = ©, and |v(Ej41)| >n+1. 

Now 9.7(d) implies that vA En) = limy+oo V(E,). However, |v(E;,)| > n 
for each n € Z*, and thus the limit in the last equation does not exist (in R). This 
contradiction completes the proof in the case where v is a real measure. 

Consider now the case where F = C; thus v is a complex measure on (X,S). 
Then 


Iv|(X) < [Rev|(X) + [Imv|(X) < oo, 


where the last inequality follows from applying the real case to Re v and Imv. 
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The previous result tells us that if (X,S) is a measurable space, then ||v|| < co 
for all v € M,(S). This implies (as the reader should verify) that the total variation 
norm ||-|| is anorm on Mf(S). The next result shows that this norm makes M(S) 
into a Banach space (in other words, every Cauchy sequence in this norm converges). 


9.18 the set of real or complex measures on (X,S) is a Banach space 


Suppose (X,S) is a measurable space. Then M,(S) is a Banach space with the 
total variation norm. 


Proof Suppose v1,V2,... is a Cauchy sequence in Mp(S). For each E € S, we 
have 


|vj(E) — ve(E)| = |(yj — ve) (E)| 
< |v; — | (E) 
< |lv; — vel]. 


Thus v;(E),v2(E),...is a Cauchy sequence in F and hence converges. Thus we can 
define a function v: S + F by 


v(E) = lim 1,(E). 


jroo J 


To show that v € M,x(S), we must verify that v is countably additive. To do this, 
suppose E,,E3,... is a disjoint sequence of sets in S. Let ¢ > 0. Let m € Z* be 
such that 


9.19 Vi - Yl] Se for all j,k > m. 


If n € Z* is such that 


9.20 >" |vm(Ex)| < € 
k=n 


[such an 7 exists by applying 9.3(b) to v;,] and if 7 > m, then 


Yo lyj(Ex)| < YON (yj — vm) (Ex)| + YC vm (Ex) | 
k=n k=n k=n 
< yvly — Vm| (Ex) +€ 
k=n 
= |v; = ml (U Ex) +e 
k=n 
9.21 < 2e, 


where the second line uses 9.20, the third line uses the countable additivity of the 
measure |v; — Vm| (see 9.11), and the fourth line uses 9.19. 
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If ¢ and 7 are as in the paragraph above, then 


(UR) - ye (60)| = [fim (UB) ~ tim 5 (0) 


J7o k=1 
= a a ¥ 0 (Ex) | 
S i. 


where the second line uses the countable additivity of the measure v; and the third line 
uses 9.21. The inequality above implies that v(U¢_, Ex) = Lp v(Ex), completing 
the proof that v € M,(S). 


We still need to prove that limy_,..||v — vj|| = 0. To do this, suppose e > 0. Let 
m € Z* be such that 


9.22 Vi — Vel] Se for all j,k > m. 


Suppose k > m. Suppose also that E),...,E, € S are disjoint subsets of X. Then 


n 


Liv —“%)(Es)| = Jim YoI(ej —n)(Ed =n 
=] 


l=1 


where the last inequality follows from 9.22 and the definition of the total variation 
norm. The inequality above implies that ||v — v;|| < ¢, completing the proof. 


EXERCISES 9A 


1 Prove or give a counterexample: If v is a real measure on a measurable 
space (X,S) and A,B € S are such that v(A) > 0 and v(B) > 0, then 
v(AUB) >0. 


2 Suppose v is areal measure on (X,S). Define pp: S — [0,00) by 
H(E) = |v(E)]. 


Prove that ji is a (positive) measure on (X,S) if and only if the range of v is 
contained in [0, 00) or the range of v is contained in (—oo, 0]. 


3 Suppose v is a complex measure on a measurable space (X,S). Prove that 
|v|(X) = v(X) if and only if v is a (positive) measure. 


4 Suppose v is a complex measure on a measurable space (X,S). Prove that if 
E € S then 


vitey= sup{ )°|v(Ex)| : E1, E2,... is a disjoint sequence in S 
k=1 


such that E = U E,}. 
k=1 
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Suppose p/ is a (positive) measure on a measurable space (X,S) and h is a 
nonnegative function in £!(y). Let v be the (positive) measure on (X,S) 
defined by dv = h du. Prove that 


[fav [ fray 


for all S-measurable functions f: X — [0,00]. 

Suppose (X,S, 11) is a (positive) measure space. Prove that 
{hdu:he L'(u)} 

is a closed subspace of Mp(S). 


(a) Suppose B is the collection of Borel subsets of R. Show that the Banach 
space M(B) is not separable. 


(b) Give an example of a measurable space (X,S) such that the Banach space 
Mg(S) is infinite-dimensional and separable. 


Suppose f > 0 and A is Lebesgue measure on the o-algebra of Borel subsets of 
(0, t]. Suppose h: [0,t] — C is the function defined by 


h(x) =cosx +isinx. 
Let v be the complex measure defined by dv = hdd. 


(a) Show that ||v|| =f. 
(b) Show that if E,,E5,...is a sequence of disjoint Borel subsets of (0, |; then 


YWv(B)| <t. 
k=1 


[This exercise shows that the supremum in the definition of |v|({0,t]) is not 
attained, even if countably many disjoint sets are allowed. ] 


Give an example to show that 9.9 can fail if the hypothesis that v is a real 
measure is replaced by the hypothesis that v is a complex measure. 


Suppose (X,S) is a measurable space with S £ {©, X}. Prove that the total 
variation norm on Mrf(S) does not come from an inner product. In other 
words, show that there does not exist an inner product (-,-) on Mp(S) such 
that ||v|| = (v,v)!/2 for all v € Mg(S), where ||-|| is the usual total variation 
norm on M,(S). 


For (X,S) a measurable space and b € X, define a finite (positive) measure dy 


on (X,S) by 
1 ifbeE, 
6p(E) = 
0 ifb¢E 
forE€ S. 
(a) Show that if b,c € X, then ||d, + d,|| = 2. 


(b) Give an example of a measurable space (X,S) and b,c € X withb £c 
such that ||d, — d¢|| A 2. 
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9B Decomposition Theorems 


Hahn Decomposition Theorem 


The next result shows that a real measure on a measurable space (X,S) decomposes 
X into two disjoint measurable sets such that every measurable subset of one of these 
two sets has nonnegative measure and every measurable subset of the other set has 
nonpositive measure. 

The decomposition in the result below is not unique because a subset D of X with 
|v|(D) = 0 could be shifted from A to B or from B to A. However, Exercise | at 
the end of this section shows that the Hahn decomposition is almost unique. 


9.23 Hahn Decomposition Theorem 


Suppose v is a real measure on a measurable space (X,S). Then there exist sets 
A,B € S such that 


(a) AUB=XandANB=Q; 
(b) v(E) > 0 for every E € S with E c A; 


(c) v(E) < 0 forevery E € S with E CB. 


9.24 Example Hahn decomposition 


Suppose }/ is a (positive) measure on a measurable space (X,S), h € L1(p) is 
real valued, and dv = hdy. Then a Hahn decomposition of the real measure v is 
obtained by setting 


A={xeEX:h(x)>0} and B={x Ee X:h(x) < O}. 
Proof of 9.23 Let 
a=sup{v(E):E€S}. 


Thus a < ||v|| < co, where the last inequality comes from 9.17. For each j € Zt, let 
Aj € & be such that 


1 
9.25 v(Aj) > a- of 


Temporarily fix k € Z*. We will show by induction on n that ifn € Z* with 
n > k, then 


9.26 v(U Aj) 26- by 


To get started with the induction, note that if n = k then 9.26 holds because in this 
case 9.26 becomes 9.25. Now for the induction step, assume that n > k and that 9.26 
holds. Then 
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n+1 n n 
V(U Aj) = v(U Aj) +v(Anit) = v((U Aj) NAn41) 
jak jak j=k 
= (0 £5) +(e) -« 
n+1 1 


where the first line follows from 9.7(b) and the second line follows from 9.25 and 
9.26. We have now verified that 9.26 holds if n is replaced by n + 1, completing the 
proof by induction of 9.26. 

The sequence of sets Ag, Ay U Agyi, Ag U Apy1 U Apyo,-.. is increasing. Thus 
taking the limit as 1 — oo of both sides of 9.26 and using 9.7(c) gives 


- 1 
9.27 v(U Aj) Re a 
j=k 
Now let ae 
A=) Jj A} 
k=1j=k 


The sequence of sets Uja1 Aj, Uj» Aj,...is decreasing. Thus 9.27 and 9.7(d) imply 
that v(A) > a. The definition of a now implies that 


v(A) =a. 


Suppose E € S and E C A. Then v(A) = a > v(A\ E). Thus we have 
v(E) = v(A) — v(A \ E) = 0, which proves (b). 

Let B = X \ A; thus (a) holds. Suppose E € S and E C B. Then we have 
v(AUE) <a=v(A). Thus v(E) = v(A UE) — v(A) < 0, which proves (c). 


Jordan Decomposition Theorem 


You should think of two complex or positive measures on a measurable space (X,S) 
as being singular with respect to each other if the two measures live on different sets. 
Here is the formal definition. 


9.28 Definition singular measures 


Suppose v and pi are complex or positive measures on a measurable space (X,S). 
Then v and yp are called singular with respect to each other, denoted v L y, if 
there exist sets A,B € S such that 


e AUB=XandANB=Q; 


e v(E) = v(EN A) and p(E) = p(ENB) forall E € S. 
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9.29 Example singular measures 
Suppose A is Lebesgue measure on the c-algebra 6 of Borel subsets of R. 
e Define positive measures v, j on (R, B) by 
v(E) = |EN(—0,0)| and p(E) =|EN (2,3)| 


for E € B. Then v 1 p because v lives on (—co,0) and yp lives on [0,00). 
Neither v nor y/ is singular with respect to A. 


e Let 11,72,... be a list of the rational numbers. Suppose w 1, w2,... is a bounded 
sequence of complex numbers. Define a complex measure v on (R, 8) by 
Wk 
{kEZ* :1,EE} 
for E € 6. Then v L A because v lives on Q and A lives on R \ Q. 


The hard work for proving the next result has already been done in proving the 
Hahn Decomposition Theorem (9.23). 


9.30 Jordan Decomposition Theorem 


e Every real measure is the difference of two finite (positive) measures that are 
singular with respect to each other. 


e More precisely, suppose v is a real measure on a measurable space (X,S). 


Then there exist unique finite (positive) measures v* and v~ on (X,S) such 
that 


9.31 — and vt Lv. 


Furthermore, 
vl] =vt+v. 


Proof Let X = AUB bea Hahn decomposition of v as in 9.23. Define functions 
vt: S — (0,00) andv~ : S — [0,00) by 
vt(E) =v(EN A) and v7 (E) =—v(ENB). 
The countable additivity of v implies vt and v~ are finite (positive) measures on 
(X,S), with vy =vt —v~ andvt Lv. 
The definition of the total vari- 
ation measure and 9.31 imply that 


Camille Jordan (1838-1922) is also 
known for certain matrices that are 


be hae aes mou vent. 0 except along the diagonal and the 
The equations v = v' —v~ and : : 
mi ae line above it. 
|v| = vt +v~ imply that 


#2. VISEY ~_|vl-v 
— d — 
5 and v 5 
Thus the finite (positive) measures vt and v~ are uniquely determined by v and the 
conditions in 9.31. 


Vv 
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Lebesgue Decomposition Theorem 


The next definition captures the notion of one measure having more sets of measure 0 
than another measure. 


9.32 Definition absolutely continuous; < 


Suppose v is a complex measure on a measurable space (X,S) and p is a 


(positive) measure on (X,S). Then v is called absolutely continuous with respect 
to p, denotedv < p, if 


v(E) = 0 for every set E € S with p(E) = 0. 


9.33 Example absolute continuity 


The reader should verify all the following examples: 


e If is a (positive) measure and h € L1(p), then h dy < yp. 
e If visareal measure, thenv* < |v|andv~ < |v]. 

e If vis a complex measure, then v < |v]. 

e If vis acomplex measure, then Rev < |v| andImv < |v]. 


e Every measure on a measurable space (X,S) is absolutely continuous with 
respect to counting measure on (X,S). 


The next result should help you think that absolute continuity and singularity are 
two extreme possibilities for the relationship between two complex measures. 


9.34 absolutely continuous and singular implies 0 measure 


Suppose }J is a (positive) measure on a measurable space (X,S). Then the only 


complex measure on (X,S) that is both absolutely continuous and singular with 
respect to 4 is the 0 measure. 


Proof Suppose v is a complex measure on (X,S) such thatv < pandv L p. Thus 
there exist sets A, B € S such that AUB = X, ANB =@, and v(E) = v(EN A) 
and (E) = u(EMB) for every E € S. 

Suppose E € S. Then 


w(EN A) = p((EN A)MB) = n(@) =0. 


Because v < pi, this implies that v(EM A) = 0. Thus v(E) = 0. Hence v is the 0 
measure. 


Our next result states that a (positive) measure on a measurable space (X,S) 
determines a decomposition of each complex measure on (X,S) as the sum of the 
two extreme types of complex measures (absolute continuity and singularity). 
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9.35 Lebesgue Decomposition Theorem 


Suppose j1 is a (positive) measure on a measurable space (X,S). 


e Every complex measure on (X,S) is the sum of a complex measure 
absolutely continuous with respect to and a complex measure singular 
with respect to LZ. 


e More precisely, suppose v is a complex measure on (X,S). Then there exist 
unique complex measures vg and vs on (X,S) such that v = vq + v; and 


Va<p and vel yp. 


Proof Let 
b = sup{|v|(B) : B € S and p(B) = 0}. 
For each k € Z*, let By € S be such that 
|v|(B,) >b-—Z% and p(By) =0. 
Let 


B= |) B. 
k=1 


Then p(B) = O and |v|(B) = b. 
Let A = X \ B. Define complex measures vy and v; on (X,S) by 


va(E) =v(ENA) and v,(E) =v(EMB). 


Clearly v = vg + Vs. 
If E € S, then 


u(E) = W(EN A) +H(ENB) = H(ENA), 


where the last equality holds because (B) = 0. The equation above implies that 
vs L py. 
To prove that vy; < pl, suppose E € S and p(E) = 0. Then p(BU E) = 0 and 
hence 
b> |v|(BUE) = |v|(B) + |v|(E\ B) = 6 + |v|(E\ B), 
which implies that |v|(E \ B) = 0. Thus 
vq(E) = v(E n A) _ v(E \ B) =) The construction of Vqg and vs, shows 


that if v is a positive (or real) 
which implies that v; << p. measure, then so are Vq and Vs. 


We have now proved all parts of this result except the uniqueness of the Lebesgue 
decomposition. To prove the uniqueness, suppose v1 and v2 are complex measures 
on (X,S) such that vy < p, vo L pt, and v = vy + v2. Then 


V1 — Vg = Vs — V9. 


The left side of the equation above is absolutely continuous with respect to js and the 
right side is singular with respect to p. Thus both sides are both absolutely continuous 
and singular with respect to p. Thus 9.34 implies that v1 = vq and v2 = vs. 
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Radon—Nikodym Theorem 


If p/ is a (positive) measure, h € L1(p), 
and dv = hdy, thenv < p. The next 
result gives the important converse—if p 
is o-finite, then every complex measure 
that is absolutely continuous with respect to j/ is of the form h dy for some h € L(1). 
The hypothesis that j/ is o-finite cannot be deleted. 


9.36 Radon—Nikodym Theorem 


Suppose p/ is a (positive) 7-finite measure on a measurable space (X,S). Suppose 


The result below was first proved by 
Radon and Otto Nikodym 


(1887-1974). 


v is a complex measure on (X,S) such that v < p. Then there exists h € L!(p1) 
such that dv = h du. 


Proof First consider the case where both yp and v are finite (positive) measures. 
Define g: L?(v + p) > R by 


9.37 p(f) =f fav. 


To show that ¢ is well defined, first note that if f € £7(v + 1), then 


a38 fiflav< /Ifla +n) < (¥(X) +H)" Ufllaeiy < ©, 


where the middle inequality follows from Hélder’s inequality (7.9) applied to the 
functions 1 and f. Now 9.38 shows that | f dv makes sense for f € L?(v + p1). 
Furthermore, if two functions in £2(v + 1) differ only on a set of (v + 4)-measure 
O, then they differ only on a set of v-measure 0. Thus ¢ as defined in 9.37 makes 
sense as a linear functional on L?(v + py). 
Because |g(f)| < \|f|dv, 9.38 
shows that @ is a bounded linear func- 
tional on L?(v + 1). The Riesz Represen- 
tation Theorem (8.47) now implies that 
there exists ¢ € £2(v + p) such that 


[fav f fedv +n) 


for all f € £L?(v + p). Hence 


The clever idea of using Hilbert 
space techniques in this proof comes 


from John von Neumann 


(1903-1957). 


9.39 [fa-s)a= | feay 


forall f € L?(v +p). 

If f equals the characteristic function of {x € X : g(x) > 1}, then the left side 
of 9.39 is less than or equal to 0 and the right side of 9.39 is greater than or equal to 
0; hence both sides are 0. Thus [ fg dj = 0, which implies (with this choice of f) 
that w({x € X: g(x) >1}) =0. 
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Similarly, if f equals the characteristic function of {x € X : g(x) < 0}, then the 
left side of 9.39 is greater than or equal to 0 and the right side of 9.39 is less than 
or equal to 0; hence both sides are 0. Thus f fg du = 0, which implies (with this 
choice of f) that u({x € X: g(x) < O}) =0. 

Because v < yp, the two previous paragraphs imply that 


v({x € X: g(x) >1})=0 and v({x eX: g(x) < O}) =0. 


Thus we can modify g (for example by redefining g to be 5 on the two sets appearing 
above; both those sets have v-measure 0 and p/-measure 0) and from now on we can 
assume that 0 < g(x) <1 for all x € X and that 9.39 holds for all f € £L7(v + p). 
Hence we can define h: X — [0,00) by 


Suppose E € S. For eachk € Z*, let 


Taking f = x,,/(1— g) in 9.39 


Xp) if Xp) — a would give v(E) = J, h dp, but this 
fielx) = ¢ 8) 8) 7 function f might not be in 
0 otherwise. L?(v + p) and thus we need to be a 


bit more careful. 


Then f, € L2(v + pn). Now 9.39 implies 


jf — g)dv = | fis ap. 


Taking the limit as k —> oo and using the Monotone Convergence Theorem (3.11) 
shows that 


9.40 [ra = [ nay. 
E E 


Thus dv = h du, completing the proof in the case where both v and p are (positive) 
finite measures [note that h € £!(j) because h is a nonnegative function and we can 
take E = X in the equation above]. 

Now relax the assumption on p/ to the hypothesis that ji is a o-finite measure. 
Thus there exists an increasing sequence X; C Xz C --- of sets in S such that 
Up, Xk = X and p(X;,) < 00 for each k € Z*. For k € Z*, let vz and py, denote 
the restrictions of v and y/ to the o-algebra on X; consisting of those sets in S that 
are subsets of X;. Then vy < pz. Thus by the case we have already proved, there 
exists a nonnegative function hy € £L' (pz) such that dup = hy dug. If j < k, then 


[du =v(E) = fa 


for every set E € S with E C X;; thus w({x € Xj: hj(x) A hy(x)}) = 0. Hence 
there exists an S-measurable function h: X — [0,00) such that 


u({x © Xe h(x) A hg(x)}) =0 


for every k € Z*. The Monotone Convergence Theorem (3.11) can now be used to 
show that 9.40 holds for every E € S. Thus dv = h dy, completing the proof in the 
case where v is a (positive) finite measure. 
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Now relax the assumption on v to the assumption that v is a real measure. The 
measure v equals one-half the difference of the two positive (finite) measures |v| + v 
and |v| — v, each of which is absolutely continuous with respect to ju. By the case 
proved in the previous paragraph, there exist h,,h_ € L! (j4) such that 


d(|v}+v)=h,du and d(|v|—v) =h_ du. 


Taking h = 3 (hy —h_), we have dv = h du, completing the proof in the case 
where v is a real measure. 

Finally, if v is a complex measure, apply the result in the previous paragraph to the 
real measures Re v, Im v, producing le, htm € L1() such that d(Rev) = hpe du 
and d(Imv) = htm dy. Taking h = hpe + ihm, we have dv = h du, completing 
the proof in the case where v is a complex measure. 


The function h provided by the Radon—Nikodym Theorem is unique up to changes 
on sets with j/-measure 0. If we think of /1 as an element of L!(j/) instead of L1(:), 
then the choice of h is unique. 

When dv = h dy, the notation h = e is used by some authors, and h is called 
the Radon—Nikodym derivative of v with respect to pL. 

The next result is a nice consequence of the Radon—Nikodym Theorem. 


41 if visacomplex measure, then dv = hd\v| for some h with |h(x)| =1 


(a) Suppose v is a real measure on a measurable space (X,S). Then there exists 
an S-measurable function h: X — {—1,1} such that dv = hdlv]. 


(b) Suppose v is a complex measure on a measurable space (X,S). Then there 
exists an S-measurable function h: X — {z € C: |z| = 1} such that 
dv =hd|v|. 


Proof Because v < |v], the Radon—-Nikodym Theorem (9.36) tells us that there 
exists h € £1(|v|) (with h real valued if v is a real measure) such that dv = h d|v]. 
Now 9.10 implies that d|v| = |h| d|v|, which implies that |h| = 1 almost everywhere 
(with respect to |v|). Refine h to be 1 on the set {x € X : |h(x)| 4 1}, which gives 
the desired result. 


We could have proved part (a) of the result above by taking h = v A Xp in the 
Hahn Decomposition Theorem (9.23). 

Conversely, we could give a new proof of the Hahn Decomposition Theorem by 
using part (a) of the result above and taking 


A={xeEX:h(x)=1} and B={xe X:h(x) = -1}. 


We could also give a new proof of the Jordan Decomposition Theorem (9.30) by 
using part (a) of the result above and taking 


+ _ am 
Vv = Xrex:n(xy <1) 44 and v 4 dul- 


= Xe X:h(x) = 
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Dual Space of L? (11) 


Recall that the dual space of a normed vector space V is the Banach space of bounded 
linear functionals on V; the dual space of V is denoted by V’. Recall also that if 
1 < p < ©, then the dual exponent p’ is defined by the equation : + 7 le 

The dual space of @? can be identified with 0P' for 1 < p < 0, as we saw in 7.26. 
We are now ready to prove the analogous result for an arbitrary (positive) measure, 
identifying the dual space of L? (1) with L?'(y) [with the mild restriction that 1 is 
o-finite if p = 1]. In the special case where p/ is counting measure on Z*, this new 
result reduces to the previous result about @?. 

For 1 < p < oo, the next result differs from 7.25 by only one word, with “to” in 
7.25 changed to “onto” below. Thus we already know (and will use in the proof) 


that the map h ++ gj, is a one-to-one linear map from L?’(y) to (L?(p))' and that 
Pull = |All» for all h € L?’(). The new aspect of the result below is the assertion 


that every bounded linear functional on L? (71) is of the form gj, for some h € L?’(j1). 
The key tool we use in proving this new assertion is the Radon—Nikodym Theorem. 


9.42 dual space of L? (1) is L?'(u) 


Suppose // is a (positive) measure and 1 < p < oo [with the additional hypothesis 
that pis a 7-finite measure if p = 1]. For h € L?’(j1), define py, : L? (1) > F by 


gn(f) = | fhean. 


Then h ++ }, is a one-to-one linear map from L?’(j1) onto (LP ( u))’. Further- 
more, ||j,|| = ||/|| pr for all h € LP'(n). 


Proof The case p = 1 is left to the reader as an exercise. Thus assume that 
1l<p<o. 
Suppose ji is a (positive) measure on a measurable space (X,S) and 9 is a 
bounded linear functional on L? (1); in other words, suppose p € (LP (n))" 
Consider first the case where yp is a finite (positive) measure. Define a function 
v: S + Fby 


v(E) = 9(X,)- 
If Ey, E2,... are disjoint sets in S, then 
1(U Fx) = (Xu, 5,) = 9(2 ae.) = 2 olt,) = PvE. 
k=1 = = = 


where the infinite sum in the third term converges in the L? (j1)-norm to Xue, Be and 
k=1 

the third equality holds because ¢ is a continuous linear functional. The equation 

above shows that v is countably additive. Thus v is a complex measure on (X,S) 


[and is a real measure if F = R]. 
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If E € S and p(E) = 0, then x, is the 0 element of L?(j), which implies that 
~(X,) = 0, which means that v(E) = 0. Hence v < p. By the Radon—Nikodym 


Theorem (9.36), there exists h € £1(1) such that dv = h du. Hence 

exp) =V(E) = f nay = f xh ay 
for every E € S. The equation above, along with the linearity of gy, implies that 
9.43 o(f)= [ph dy for every simple S-measurable function f: X — F. 


Because every bounded S-measurable function is the uniform limit on X of a 
sequence of simple S-measurable functions (see 2.89), we can conclude from 9.43 
that 


9.44 g(f) = | fhan for every f € L*(u). 


For k € Zt, let 
Ey = {x € X:0 < |h(x)| < k} 
and define f, € L?(y) by 
h(x) |h(x)|P 2 if x © Ex, 
AG ee 


0 otherwise. 


9.45 


Now 


1 ! 1/p 
[el x5, 4H = Ufa) < Noll fell = Noll | Wale, ae)", 


where the first equality follows from 9.44 and 9.45, and the last equality follows from 
9.45 [which implies that | f(x) |? = [ia(x)|P' xp, (2) for x € X]. After dividing by 


/ i/p 
( fn? Xe, dy) , the inequality between the first and last terms above becomes 


exe,lp° < lll. 


Taking the limit as k — oo shows, via the Monotone Convergence Theorem (3.11), 
that 


Ally < llell- 


Thus h € L?’(w). Because each f € L?(j) can be approximated in the L? (2) norm 
by functions in L® (jz), 9.44 now shows that = gy, completing the proof in the 
case where p/ is a finite (positive) measure. 

Now relax the assumption that p/ is a finite (positive) measure to the hypothesis 
that p/ is a (positive) measure. For E € S,let Se = {A © S: A C E} and let pe 
be the (positive) measure on (E, Sz) defined by e(A) = u(A) for A € Se. We 
can identify L? (jf) with the subspace of functions in L?(y) that vanish (almost 
everywhere) outside E. With this identification, let pg = PIL? (up): Then g¢ isa 
bounded linear functional on L? (wg) and ||ge|| < || ¢@I|. 
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If E € S and p(E) < oo, then the finite measure case that we have already proved 
as applied to gr implies that there exists a unique hp € LP'( Mg) such that 


9.46 gp(f) = [fie du for all f € LP (pe). 


If D,E € Sand D C E with p(E) < ©, then hp(x) = hg(x) for almost every 
x € D (use the uniqueness part of the result). 
For each k € Z*, there exists f, € L?(j1) such that 


1 
9.47 IIfellp <1 and [p(f)| > lel — ¢ 
The Dominated Convergence Theorem (3.31) implies that 


dim |[fcX¢ ¢ x: fie(x)| > 2} — fell, = 


for each k € Z*. Thus we can replace fj, by FkX ta EX:lflo)l> 4} for sufficiently 
large n and still have 9.47 hold. In other words, for each k € Z*, we can assume that 
there exists n; € Z* such that for each x € X, either | f,(x)| > 1/mg or f(x) = 0. 

Set D, = {x € X: |fx(x)| > 1/ng}. Then (Dz) < 09 [because f, € LP (p)] 
and 


9.48 f(x) = 0 for all x € X \ Dg. 


For k € Zt, let Ey = D1 U--:U Dy. Because Ey C E> C -:-, we see that if 
] <k, then he, (x) = he, (x) for almost every x € E;. Also, 9.47 and 9.48 imply 
that 


9.49 Jim [he lly = fim llpell = lel: 


Let E = Ue, Ex. Let h be the function that equals i, almost everywhere on E; 
for each k € Z* and equals 0 on X \ E. The Monotone Convergence Theorem and 
9.49 show that 

All = llell- 

If f € L?(pe), then limp_+oo||f — fX_,||p = 0 by the Dominated Convergence 

Theorem. Thus if f € L? (ye), then 


9.50 ef) = lim e(fx,,) = lim | frx,,hau =f fhay, 


where the first equality follows from the continuity of @, the second equality follows 
from 9.46 as applied to each E; [valid because ji(E;,) < oo], and the third equality 
follows from the Dominated Convergence Theorem. 

If D is an S-measurable subset of X \ E with 1(D) < ©, then ||hp||,. = 0 
because otherwise we would have ||h + hp||p: > ||h||,” and the linear functional on 
LP (yt) induced by h + hp would have norm larger than || || even though it agrees 
with g on LP (Eup). Because ||hp||,, = 0, we see from 9.50 that p(f) = f fh du 
for all f € L?(weEUp). 

Every element of L?(j) can be approximated in norm by elements of L? (uz) 
plus functions that live on subsets of X \ E with finite measure. Thus the previous 
paragraph implies that p(f) = f fh dp for all f € L? (1), completing the proof. 
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EXERCISES 9B 


1 Suppose v is a real measure on a measurable space (X,S). Prove that the Hahn 
decomposition of v is almost unique, in the sense that if A,B and A’, B’ are 
pairs satisfying the Hahn Decomposition Theorem (9.23), then 


Jv|(A\ A!) = |ul(A"\ A) = |v|(B\ BY) = |v (BY\ B) = 0. 
2 Suppose p is a (positive) measure and ¢,h € L(y). Prove that gdu 1 hdyit 
and only if g(x)h(x) = 0 for almost every x € X. 


3 Suppose v and ji are complex measures on a measurable space (X,S). Show 
that the following are equivalent: 


(a) vi uy. 
(b) |v| 1 |nl. 
(c) Rev L pandImv _ p. 


4 Suppose v and ji are complex measures on a measurable space (X,S). Prove 
that if v L ys, then |v + pl = |v| + |p| and |/v + p|| = ||v|] + [pI]. 


5 Suppose v and 1 are finite (positive) measures. Prove that v L py if and only if 
lly — ll = lull + [el 
6 Suppose py is a complex or positive measure on a measurable space (X,S). 


Prove that 
{v € Mp(S):v 1 p} 


is a closed subspace of Mp(S). 


7 Use the Cantor set to prove that there exists a (positive) measure v on (R, B) 
such that v 1 A and v(R) # 0 but v({x}) = 0 for every x € R; here A 
denotes Lebesgue measure on the o-algebra B of Borel subsets of R. 

[The second bullet point in Example 9.29 does not provide an example of the 
desired behavior because in that example, v({r,}) 4 0 for all k © Z* with 
wr # 0.) 


8 Suppose v is a real measure on a measurable space (X,S). Prove that 
v+(E) =sup{v(D):D € Sand Dc E} 
and 


v (E) =—inf{v(D):D € SandD Cc EF}. 


9 Suppose 1 is a (positive) finite measure on a measurable space (X,S) andh isa 
nonnegative function in £'(p). Thus h dy < dy. Find a reasonable condition 
on ht that is equivalent to the condition du < hdu. 


10 


11 


12 


13 


14 


15 


16 
17 


18 
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Suppose p is a (positive) measure on a measurable space (X,S) and v is a 
complex measure on (X,S). Show that the following are equivalent: 


(a) V<p. 
(b) |v] < p. 
(c) Rev < wandImv < p. 


Suppose ji is a (positive) measure on a measurable space (X,S) and v is a real 
measure on (X,S). Show that v < pif and only ifv? < pandu” < p. 


Suppose }/ is a (positive) measure on a measurable space (X,S). Prove that 
{vy € Mg(S):v < p} 
is a closed subspace of Mp(S). 


Give an example to show that the Radon—Nikodym Theorem (9.36) can fail if 
the o-finite hypothesis is eliminated. 


Suppose ji is a (positive) 7-finite measure on a measurable space (X,S) and v 
is a complex measure on (X,S). Show that the following are equivalent: 


(a) V<p. 


(b) for every ¢ > 0, there exists d > 0 such that |v(E)| < e for every set 
E € S with p(E) <0. 


(c) for every ¢ > 0, there exists 6 > 0 such that |v|(E) < e for every set 
E € S with p(E) <0. 


Prove 9.42 [with the extra hypothesis that j/ is a o-finite (positive) measure] in 
the case where p = 1. 


Explain where the proof of 9.42 fails if p = oo. 


Prove that if j/ is a (positive) measure and 1 < p < oo, then LP (1) is reflexive. 
[See the definition before Exercise 19 in Section 7B for the meaning of reflexive. ] 


Prove that L'(R) is not reflexive. 
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Chapter 10 ome 


Linear Maps on Hilbert Spaces 


A special tool called the adjoint helps provide insight into the behavior of linear maps 
on Hilbert spaces. This chapter begins with a study of the adjoint and its connection 
to the null space and range of a linear map. 

Then we discuss various issues connected with the invertibility of operators on 
Hilbert spaces. These issues lead to the spectrum, which is a set of numbers that 
gives important information about an operator. 

This chapter then looks at special classes of operators on Hilbert spaces: self- 
adjoint operators, normal operators, isometries, unitary operators, integral operators, 
and compact operators. 

Even on infinite-dimensional Hilbert spaces, compact operators display many 
characteristics expected from finite-dimensional linear algebra. We will see that 
the powerful Spectral Theorem for compact operators greatly resembles the finite- 
dimensional version. Also, we develop the Singular Value Decomposition for an 
arbitrary compact operator, again quite similar to the finite-dimensional result. 


Sth fie | 0 ie fare omnia w 
The Botanical Garden at Uppsala University (the oldest university in Sweden, 
founded in 1477), where Erik Fredholm (1866-1927) was a student. The theorem 
called the Fredholm Alternative, which we prove in this chapter, states that a 
compact operator minus a nonzero scalar multiple of the identity operator 
is injective if and only if it is surjective. 
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10A Adjoints and Invertibility 


Adjoints of Linear Maps on Hilbert Spaces 


The next definition provides a key tool for studying linear maps on Hilbert spaces. 


(10.1 Definition adjoint; T* 


Suppose V and W are Hilbert spaces and T: V — W is a bounded linear map. 
The adjoint of T is the function T*: W — V such that 


Ge ie es) 


for every f € V and every g € W. 


To see why the definition above makes 
sense, fix g © W. Consider the linear 
functional on V defined by f +> (Tf,g). 
This linear functional is bounded because 


I(T F3)1 < ITF Msi < ITM stl IFT 


for all f € V; thus the linear functional f + (Tf,g) has norm at most ||T|| |||. By 
the Riesz Representation Theorem (8.47), there exists a unique element of V (with 
norm at most ||T'|| || ¢||) such that this linear functional is given by taking the inner 
product with it. We call this unique element T*g. In other words, T*g is the unique 
element of V such that 


10.2 (Tf,g) = (f,T"g) 


for every f € V. Furthermore, 


The word adjoint has two unrelated 


meanings in linear algebra. We need 
only the meaning defined above. 


10.3 T*sll < ITH. 


In 10.2, notice that the inner product on the left is the inner product in W and the 
inner product on the right is the inner product in V. 
10.4 Example multiplication operators 

Suppose (X,S, j1) is a measure space and h € £L®(p). Define the multiplication 
operator Mj: L?() + L*(p) by 


Mnf = fh. 
Then My, is a bounded linear map and ||My|| < ||/||.0. Because 


_ The complex conjugates that appear 
(Mnf, 8) = | fre du = (f, Mig) in this example are unnecessary (but 


they do no harm) if F = R. 


for all f,g € L?(y), we have Mj,* = Mr. 
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10.5 Example linear maps induced by integration 


Suppose (X,S, 1) and (Y,7,v) are o-finite measure spaces and K € L7(1 x v). 
Define a linear map Zx: L?(v) + L(y) by 


10.6 (Zxf)(x) = [ Kew) dv(y) 


for f € L?(v) and x € X. To see that this definition makes sense, first note that 
there are no worrisome measurability issues because for each x € X, the function 
y + K(x,y) is a J -measurable function on Y (see 5.9). 

Suppose f € L?(v). Use the Cauchy—Schwarz inequality (8.11) or Hélder’s 
inequality (7.9) to show that 


107 [K(x w)ILF dy) < ([ Ke w)Pavy)) fll 


for every x € X. Squaring both sides of the inequality above and then integrating on 
X with respect to pz gives 


[fikcomiteenlary)) ante) < (f, [Kea P avy) ax(2)) If eq) 


= Kl? T2(prvyllf liiaqy 


where the last line holds by Tonelli’s Theorem (5.28). The inequality above implies 
that the integral on the left side of 10.7 is finite for y-almost every x € X. Thus 
the integral in 10.6 makes sense for j/-almost every x € X. Now the last inequality 
above shows that 


IZcflldaqy = [VEAP dul) < UKM Baqueryllfllbaey 
Thus Zx is a bounded linear map from L?(v) to L*(1) and 
10.8 Zxll < IKllr2(uxv)- 
Define K*: Y x X — Fby 


Ky, x) = K(x, y), 
and note that K* € L?(v x 1). Thus Zx«: L*() + L*(v) is a bounded linear map. 
Using Tonelli’s Theorem (5.28) and Fubini’s Theorem (5.32), we have 


(Ixf.s) = [| Kew Fv) dv(y)ale) du(x) 
= [ Fu) [Key au(x) arly) 
= [| FTesW) arly) = (f.Tees) 


for all f € L?(v) and all g € L*(y). Thus 
10.9 (Zx)* = Lye. 
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10.10 Example linear maps induced by matrices 


As a special case of the previous example, suppose m,n € Z*, p is counting 
measure on {1,...,m}, v is counting measure on {1,...,2}, and K is an m-by-n 
matrix with entry K(i, j) € F in row 7, column j. In this case, the linear map 
Tx: L?(v) + L?(1) induced by integration is given by the equation 

n 
(Zkf)(i) = LGM 
j= 
for f € L?(v). If we identify L?(v) and L?(j) with F" and F” and then think of 
elements of F” and F” as column vectors, then the equation above shows that the 
linear map Zx: F” — F” is simply matrix multiplication by K. 

In this setting, K* is called the conjugate transpose of K because the n-by-m 
matrix K* is obtained by interchanging the rows and the columns of K and then 
taking the complex conjugate of each entry. 

The previous example now shows that 

m 


IZell < (EDIKGAP) 


i=1j=1 


Furthermore, the previous example shows that the adjoint of the linear map of 
multiplication by the matrix K is the linear map of multiplication by the conjugate 
transpose matrix K*, a result that may be familiar to you from linear algebra. 


If T is a bounded linear map from a Hilbert space V to a Hilbert space W, then 
the adjoint T* has been defined as a function from W to V. We now show that the 
adjoint T* is linear and bounded. Recall that B(V,W) denotes the Banach space of 
bounded linear maps from V to W. 


10.11 T* is a bounded linear map 


Suppose V and W are Hilbert spaces and T € B(V,W). Then 


TSW Vil eels ean wll ||| ||, 


Proof Suppose g1, 22 € W. Then 


(f, T"(g1 + 82)) = (Tf,81 +82) = (Tf, a1) + (TF, 82) 
= lg ee 3) 
= (f,T"g1 +T*g2) 
for all f € V. Thus T*(¢) + 92) = T* 91 + T* Qo. 
Suppose a € F and g € W. Then 
(f,T"(ag)) = (Tf, ag) = a(Tf,g) = a(f,T"g) = (f, aT") 


for all f € V. Thus T*(ag) = aT*g. 
We have now shown that T*: W — V is a linear map. From 10.3, we see that T* 
is bounded. In other words, T* € B(W,V). 
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Because T* € B(W,V), its adjoint (T*)*: V — W is defined. Suppose f € V. 
Then 


CY fe) = Se a 8) Ek) 


for all g € W. Thus (T*)*f = Tf, and hence (T*)* = T. 
From 10.3, we see that ||T*|] < ||T||. Applying this inequality with T replaced 
by T* we have 
T°] < TN = er") ST" 
Because the first and last terms above are the same, the first inequality must be an 
equality. In other words, we have ||T*|| = ||T||. 


Parts (a) and (b) of the next result show that if V and W are real Hilbert spaces, 
then the function T ++ T* from 6(V,W) to B(W, V) is a linear map. However, 
if V and W are nonzero complex Hilbert spaces, then T ++ T™* is not a linear map 
because of the complex conjugate in (b). 


10.12 properties of the adjoint 


Suppose V, W, and U are Hilbert spaces. Then 


(a) (S+T)* =S* +T* forall S,T € B(V,W); 


(b) (aT)* =aT* for all « € Fandall T € B(V,W); 


(c) I* = I, where I is the identity operator on V; 


(d) (SoT)* = T* oS* forall T € B(V,W) and S € B(W,U). 


Proof 
(a) The proof of (a) is left to the reader as an exercise. 
(b) Suppose a € Fand T € B(V,W). If f € Vand g € W, then 
(f, (aT)"g) = (aT f,g) = a(Tf,g) = a(f, T's) = (f, aT" 8). 
Thus (aT )*g = ®T*g, as desired. 


(c) If f,g € V, then 
(f, Ig) = (If, 8) = (f,8)- 
Thus I*g = g, as desired. 


(d) Suppose T € B(V,W) and S € B(W,U). If f € V and g € U, then 
(f, (So T)"g) = (ST) fg) = (S(TF),8) = (Tg, $*g) = (fT (S"8)). 


Thus (So T)*9 = T*(S*g) = (7* © S*)(e). Hence (So T)* = T* o5*, as 
desired. 
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Null Spaces and Ranges in Terms of Adjoints 


The next result shows the relationship between the null space and the range of a linear 
map and its adjoint. The orthogonal complement of each subset of a Hilbert space is 
closed [see 8.40(a)]. However, the range of a bounded linear map on a Hilbert space 
need not be closed (see Example 10.15 or Exercises 9 and 10 for examples). Thus in 
parts (b) and (d) of the result below, we must take the closure of the range. 


10.13 null space and range of T* 


Suppose V and W are Hilbert spaces and T € B(V,W). Then 


(a) null T* = (range T 


(b) range T7* = (null T 


(c) null T = (range T* 


) 
) 
) 


(d) range T = (null T* 


Proof We begin by proving (a). Let g ¢ W. Then 


ge nullT* <> T*g=0 
<=> (f,T*g) =O forall f EV 
<=> (Tf,g) = 0 forall f € V 
<> g € (rangeT)+. 


Thus null T* = (range T)+, proving (a). 


If we take the orthogonal complement of both sides of (a), we get (d), where we 
have used 8.41. Replacing T with T* in (a) gives (c), where we have used 10.11. 
Finally, replacing T with T* in (d) gives (b). 


As a corollary of the result above, we have the following result, which gives a 
useful way to determine whether or not a linear map has a dense range. 


10.14 necessary and sufficient condition for dense range 


Suppose V and W are Hilbert spaces and T € B(V,W). Then T has dense range 
if and only if T* is injective. 


Proof From 10.13(d) we see that T has dense range if and only if (null T*)+ = W, 
which happens if and only if null T* = {0}, which happens if and only if T* is 
injective. 


The advantage of using the result above is that to determine whether or not a 
bounded linear map T between Hilbert spaces has a dense range, we need only 
determine whether or not 0 is the only solution to the equation T*g = 0. The next 
example illustrates this procedure. 
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10.15 Example Volterra operator 
The Volterra operator is the linear map V: L([0,1]) > L?((0,1]) defined by 


A(x) = [Fay 


for f € L?({0,1]) and x € [0,1]; here dy means dA(y), where A is the usual 
Lebesgue measure on the interval (0, 1]. 

To show that V is a bounded linear map from L*([0,1]) to L?([0,1]), let K be the 
function on [0,1] x [0,1] defined by 


K(x,y) = {3 oe 
ifx<y. 
In other words, K is the characteris- 
tic function of the triangle below the 
diagonal of the unit square. Clearly 
K € £2(A x A) and V = Tx as defined 
in 10.6. Thus V is a bounded linear map 
from L?((0,1}) to L?({0,1]) and ||V| < B (by 10.8). 
Because V* = Tx« (by 10.9) and K* is the characteristic function of the closed 
triangle above the diagonal of the unit square, we see that 


Vito Volterra (1860-1940) was a 
pioneer in developing functional 


analytic techniques to study integral 
equations. 


wre =f sere=[ reray-[rodey 


for f € L*([0,1]) and x € [0,1]. 

Now we can show that V* is injective. To do this, suppose f € L?((0,1]) and 
Y* f = 0. Differentiating both sides of 10.16 with respect to x and using the Lebesgue 
Differentiation Theorem (4.19), we conclude that f = 0. Hence Y* is injective. Thus 
the Volterra operator V has dense range (by 10.14). 

Although range V is dense in L7({0,1]), it does not equal L((0,1]) (because 
every element of range V is a continuous function on [0,1] that vanishes at 0). Thus 
the Volterra operator V has dense but not closed range in L*([0,1]). 


Invertibility of Operators 


Linear maps from a vector space to itself are so important that they get a special name 
and special notation. 


10.17 Definition operator; B(V) 


e An operator is a linear map from a vector space to itself. 


e If V is a normed vector space, then B(V) denotes the normed vector space 
of bounded operators on V. In other words, B(V) = B(V,V). 
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10.18 Definition invertible; T~! 


e An operator T on a vector space V is called invertible if T is a one-to-one 


and surjective linear map of V onto V. 


e Equivalently, an operator T: V — V is invertible if and only if there exists 
an operator T~!: V > V such that T-!o T= ToT! =I]. 


The second bullet point above is equivalent to the first bullet point because if 
a linear map T: V — V is one-to-one and surjective, then the inverse function 
T~!: V > V is automatically linear (as you should verify). 

Also, if V is a Banach space and T is a bounded operator on V that is invertible, 
then the inverse T~! is automatically bounded, as follows from the Bounded Inverse 
Theorem (6.83). 

The next result shows that inverses and adjoints work well together. In the proof, 
we use the common convention of writing composition of linear maps with the same 
notation as multiplication. In other words, if S and T are linear maps such that S o T 
makes sense, then from now on 


ST = SoT. 


10.19 inverse of the adjoint equals adjoint of the inverse 


A bounded operator T on a Hilbert space is invertible if and only if T* is invertible. 
Furthermore, if T is invertible, then (T* iat = (has 


Proof First suppose T is invertible. Taking the adjoint of all three sides of the 
equation T~!'T = TT~! = I, we get 


rr = ma a = L, 


which implies that T* is invertible and (T*)~! = (T~!)* 
Now suppose T™ is invertible. Then by the direction just proved, (T*)* is invert- 
ible. Because (T*)* = T, this implies that T is invertible, completing the proof. 


Norms work well with the composition of linear maps, as shown in the next result. 


10.20 norm of a composition of linear maps 


Suppose U,V, W are normed vector spaces, T € 6(U,V), and S € B(V,W). 
Then 


STI] < ISI ITI- 


Proof If f € U, then 


IST) = ISCTA < WSINTAM < WSTAT ULSI 
Thus ||ST|| < |[S|| |||], as desired. 
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Unlike linear maps from one vector space to a different vector space, operators on 
the same vector space can be composed with each other and raised to powers. 


10.24 Definition T* 


Suppose T is an operator on a vector space V. 


e For k € Z*, the operator T* is defined by T* = TT--- T. 
—_—_— 


k times 


e T° is defined to be the identity operator I: V > V. 


_You should verify that powers of an operator satisfy the usual arithmetic rules: 
TITK = Ti+ and (T/)* = Ti for j,k € Z+. Also, if V is a normed vector space 
and T € B(V), then 

k k 
T"ll < (ITI 


for every k € Z*, as follows from using induction on 10.20. 
Recall that if z € C with |z| < 1, then the formula for the sum of a geometric 


series shows that 
1 foe) 
=r#, 
on =) 


The next result shows that this formula carries over to operators on Banach spaces. 


10.22 operators in the open unit ball centered at the identity are invertible 


If T is a bounded operator on a Banach space and ||T|| < 1, then I — T is 


Cc 
=)or. 
k=0 


invertible and 


Proof Suppose T is a bounded operator on a Banach space V and ||T|| < 1. Then 
yin < Dini = 17 < 
Hence 6.47 and 6.41 imply that the infinite sum )°° 9 T* converges in B(V). Now 
10.23 (I-T) s T* = lim ( 7) T' = lim(I-T"*) =1, 
k=0 
where the last equality holds because ||T"*!|| < ||T||*1 and ||T|| < 1. Similarly, 
10.24 @ T*) (1-1) = lim er (I= T) = lim 1-7") = 1. 


Equations 10.23 and 10.24 imply that I — T is invertible and (I— T)~! = 222.9 T*, 
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Now we use the previous result to show that the set of invertible operators on a 
Banach space is open. 


10.25 invertible operators form an open set 


Suppose V is a Banach space. Then {T € B(V) : T is invertible} is an open 
subset of B(V). 


Proof Suppose T € B(V) is invertible. Suppose S € B(V) and 


1 
(t= 3] <a ap 
\|T-"|| 
Then 
[P= 38 | =| rat s|| =< |r“ =s <4 


Hence 10.22 implies that I — (I — T'S) is invertible; in other words, T~!S is 
invertible. 

Now S = T(T~!S). Thus S is the product of two invertible operators, which 
implies that S is invertible with S~! = (T~!S)-!T-1, 

We have shown that every element of the open ball of radius ||T~ 
T is invertible. Thus the set of invertible elements of B(V) is open. 


1\|-1 centered at 


10.26 Definition left invertible; right invertible 


Suppose T is a bounded operator on a Banach space V. 
e T is called left invertible if there exists S € B(V) such that ST = I. 
e T is called right invertible if there exists S € B(V) such that TS = I. 


One of the wonderful theorems of linear algebra states that left invertibility and 
right invertibility and invertibility are all equivalent to each other for operators on 
a finite-dimensional vector space. The next example shows that this result fails on 
infinite-dimensional Hilbert spaces. 


10.27, Example left invertibility is not equivalent to right invertibility 


Define the right shift T: (2 — (7 and the left shift S: 2 + @ by 


T (a, 42, 03,...) = (0,44, 42,03,...) 


and 
S(m, a2, A3,.. :) = (a2, a3,Q4,.. “i 
Because ST = I, we see that T is left invertible and S is right invertible. However, T 


is neither invertible nor right invertible because it is not surjective, and S is neither 
invertible nor left invertible because it is not injective. 
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The result 10.29 below gives equivalent conditions for an operator on a Hilbert 
space to be left invertible. On finite-dimensional vector spaces, left invertibility 
is equivalent to injectivity. The example below shows that this fails on infinite- 
dimensional Hilbert spaces. Thus we cannot eliminate the closed range requirement 
in part (c) of 10.29. 


10.28 Example injective but not left invertible 
Define T: (7 —> 7 by 
T (a1, 49, a3,.- <) = (a, ALS, 
Then T is an injective bounded operator on (?. 
Suppose S is an operator on @ such that ST = I. For n € Z*, lete, € @ be the 
vector with 1 in the n"-slot and 0 elsewhere. Then 
Sen = S(nTe,) = n(ST)(en) = neg. 
The equation above implies that S is unbounded. Thus T is not left invertible, even 
though T is injective. 
10.29 left invertibility 
Suppose V is a Hilbert space and T € B(V). Then the following are equivalent: 


(a) T is left invertible. 


(b) there exists w € (0,00) such that || f|| < a||Tf|| for all f € V. 


(c) T is injective and has closed range. 


(d) T*T is invertible. 


Proof First suppose (a) holds. Thus there exists S € B(V) such that ST = I. If 


f € V, then 
fll = IIS(TADI < ISITSAL 


Thus (b) holds with « = ||S||, proving that (a) implies (b). 
Now suppose (b) holds. Thus there exists « € (0,00) such that 


10.30 ILFll < ol] fl] for all f € V. 


The inequality above shows that if f € V and Tf = 0, then f = 0. Thus T is 
injective. To show that T has closed range, suppose f1, fo,... is a sequence in V 
such Tf, T f2,... converges in V to some g € V. Thus the sequence Tf, T fz,... is 
a Cauchy sequence in V. The inequality 10.30 then implies that f;, fo,... is a Cauchy 
sequence in V. Thus f1, f2,... converges in V to some f € V, which implies that 
Tf = g. Hence g € range T, completing the proof that T has closed range, and 
completing the proof that (b) implies (c). 
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Now suppose (c) holds, so T is injective and has closed range. We want to prove 
that (a) holds. Let R: range T — V be the inverse of the one-to-one linear function 
f ++ Tf that maps V onto range T. Because range T is a closed subspace of V and 
thus is a Banach space [by 6.16(b)], the Bounded Inverse Theorem (6.83) implies 
that R is a bounded linear map. Let P denote the orthogonal projection of V onto the 
closed subspace range T. Define S: V + V by 


Sg = R(Pg). 
Then for each g € V, we have 
ISgi] = RCP) ll < IRI Psll < IRIlllg 


| : | 


where the last inequality comes from 8.37(d). The inequality above implies that S is 
a bounded operator on V. If f € V, then 


S(Tf) = R(P(TF)) = R(TA) = f. 


Thus ST = I, which means that T is left invertible, completing the proof that (c) 
implies (a). 

At this stage of the proof we know that (a), (b), and (c) are equivalent. To prove 
that one of these implies (d), suppose (b) holds. Squaring the inequality in (b), we 
see that if f € V, then 


IIfll? < o?|IT FI? = a? (TTS, f) < a? ||T*TF I IF Il. 


which implies that 
2 
fll < «TTS. 


In other words, (b) holds with T replaced by T*T (and « replaced by a7). By the 
equivalence we already proved between (a) and (b), we conclude that T*T is left 
invertible. Thus there exists S € B(V) such that S(T*T) = I. Taking adjoints of 
both sides of the last equation shows that (T*T)5* = I. Thus T*T is also right 
invertible, which implies that T*T is invertible. Thus (b) implies (d). 

Finally, suppose (d) holds, so T*T is invertible. Hence there exists S € B(V) 
such that ! = S(T*T) = (ST*)T. Thus T is left invertible, showing that (d) implies 
(a), completing the proof that (a), (b), (c), and (d) are equivalent. 


You may be familiar with the finite-dimensional result that right invertibility is 


equivalent to surjectivity. The next result shows that this equivalency also holds on 
infinite-dimensional Hilbert spaces. 


10.31 right invertibility 


Suppose V is a Hilbert space and T € B(V). Then the following are equivalent: 


(a) T is right invertible. 
(b) T is surjective. 


(c) TT™ is invertible. 
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Proof ‘Taking adjoints shows that an operator is right invertible if and only if its 
adjoint is left invertible. Thus the equivalence of (a) and (c) in this result follows 
immediately from the equivalence of (a) and (d) in 10.29 applied to T* instead of T. 

Suppose (a) holds, so T is right invertible. Hence there exists S € B(V) such 
that TS = I. Thus T(Sf) = f for every f € V, which implies that T is surjective, 
completing the proof that (a) implies (b). 

To prove that (b) implies (a), suppose T is surjective. Define R: (null Thc aw V 
by R= 2 rial T)l: Clearly R is injective because 


null R = (null T)+ 9 (null T) = {0}. 


If f € V, then f = ¢ +h for some g € nullT and some ht € (nullT)+ (by 
8.43); thus Tf = Th = Rh, which implies that range T = range R. Because T is 
surjective, this implies that range R = V. In other words, R is a continuous injective 
linear map of (null T)> onto V. The Bounded Inverse Theorem (6.83) now implies 
that R~!: V — (null T)+ is a bounded linear map on V. We have TR~! = I. Thus 
T is right invertible, completing the proof that (b) implies (a). 


EXERCISES 10A 


1 Define T: (2 — @ by T(a1,a2,...) = (0,a4,42,...). Find a formula for T*. 


2 Suppose V is a Hilbert space, U is a closed subspace of V, and T: U — V is 
defined by Tf = f. Describe the linear operator T*: V — U. 


3 Suppose V and W are Hilbert spaces and g € V,h € W. Define T € B(V,W) 
by Tf = (f,g)h. Find a formula for T*. 


4 Suppose V and W are Hilbert spaces and T € B(V,W) has finite-dimensional 
range. Prove that T* also has finite-dimensional range. 


5 Prove or give a counterexample: If V is a Hilbert space and T: V + Visa 
bounded linear map such that dim null T < ©, then dim null T* < oo, 


6 Suppose T is a bounded linear map from a Hilbert space V to a Hilbert space W. 
Prove that ||T*T|| = ||T||?. 
[This formula for ||T*T|| leads to the important subject of C*-algebras.] 


7 Suppose V is a Hilbert space and Inv(V) is the set of invertible bounded oper- 
ators on V. Think of Inv(V) as a metric space with the metric it inherits as a 
subset of B(V). Show that T ++ T~! is a continuous function from Inv(V) to 
Inv(V). 


8 Suppose T is a bounded operator on a Hilbert space. 


(a) Prove that T is left invertible if and only if T* is right invertible. 
(b) Prove that T is invertible if and only if T is both left and right invertible. 


9 


10 


11 


12 


13 


14 
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Suppose b1,b2,... is a bounded sequence in F. Define a bounded linear map 
T: 0? +f by 
T(a1,42,...) = (a1b1,azbp,...). 
(a) Find a formula for T*. 
(b) Show that T is injective if and only if b, 4 0 for every k € Z*. 
(c) Show that T has dense range if and only if by 4 0 for every k € Zt. 
(d) Show that T has closed range if and only if 


inf{|by|:k € Z* and by £0} > 0. 
(e) Show that T is invertible if and only if 
inf{|by|:k € Z} > 0. 


Suppose h € £%(R) and M,,: L?(R) — L?(R) is the bounded operator 

defined by M;,f = fh. 

(a) Show that M,, is injective if and only if |{x € R: h(x) = 0}| =0. 

(b) Find a necessary and sufficient condition (in terms of 1) for Mj, to have 
dense range. 


(c) Find a necessary and sufficient condition (in terms of 1) for Mj, to have 
closed range. 


(d) Find a necessary and sufficient condition (in terms of h) for My, to be 
invertible. 


(a) Prove or give a counterexample: If T is a bounded operator on a Hilbert 
space such that T and T* are both injective, then T is invertible. 


(b) Prove or give a counterexample: If T is a bounded operator on a Hilbert 
space such that T and T* are both surjective, then T is invertible. 


Define T: (2 —> @ by T(a1, a2, 43,.. .) = (a2, 43,a4,...). Suppose « € F. 
(a) Prove that T — «I is injective if and only if |x| > 1. 
(b) Prove that T — al is invertible if and only if || > 1. 


(c) Prove that T — «I is surjective if and only if |a| # 1. 
(d) Prove that T — al is left invertible if and only if |a| > 1. 


Suppose V is a Hilbert space. 


(a) Show that {T € B(V) : T is left invertible} is an open subset of B(V). 
(b) Show that {T € B(V) : T is right invertible} is an open subset of B(V). 


Suppose T is a bounded operator on a Hilbert space V. 


(a) Prove that T is invertible if and only if T has a unique left inverse. In 
other words, prove that T is invertible if and only if there exists a unique 
S € B(V) such that ST = I. 

(b) Prove that T is invertible if and only if T has a unique right inverse. In 


other words, prove that T is invertible if and only if there exists a unique 
S € B(V) such that TS = I. 
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10B Spectrum 


Spectrum of an Operator 


The following definitions play key roles in operator theory. 


10.32 Definition spectrum; sp(T); eigenvalue 


Suppose T is a bounded operator on a Banach space V. 
e A number a € F is called an eigenvalue of T if T — aI is not injective. 
e A nonzero vector f € V is called an eigenvector of T corresponding to an 
eigenvalue w € F if 
Tf = af. 
e The spectrum of T is denoted sp(T) and is defined by 


sp(T) = {a € F: T — al is not invertible}. 


If T — aI is not injective, then T — aI is not invertible. Thus the set of eigenvalues 
of a bounded operator T is contained in the spectrum of T. If V is a finite-dimensional 
Banach space and T € 5(V), then T — aI is not injective if and only if T — wl is 
not invertible. Thus if T is an operator on a finite-dimensional Banach space, then 
the spectrum of T equals the set of eigenvalues of T. 

However, on infinite-dimensional Banach spaces, the spectrum of an operator does 
not necessarily equal the set of eigenvalues, as shown in the next example. 


10.33 Example eigenvalues and spectrum 
Verifying all the assertions in this example should help solidify your understanding 
of the definition of the spectrum. 


e Suppose b,,b2,... is a bounded sequence in F. Define a bounded linear map 
T: C2 > & by 

T (a1, a2,...) = (a,b), azb2,...). 
Then the set of eigenvalues of T equals {by : k € Z*} and the spectrum of T 
equals the closure of {by : k € Zt}. 
Suppose h € £°(R). Define a bounded linear map M;,: L?(R) — L?(R) by 


Mnf = fh. 
Then « € F is an eigenvalue of My, if and only if |{t € R: h(t) = a}| > 0. 
Also, « € sp(M,,) if and only if |{f € R: |h(t) — a| < e}| > 0 forall e > 0. 
Define the right shift T: (2 —> (? and the left shift S: 2 > @ by 
T(a1,42,03,...) = (0,41,42,43,...) and S(a1, a2, 43,...) = (a2,@3,04,...). 
Then T has no eigenvalues, and sp(T) = {a € F: |a| < 1}. Also, the set of 


eigenvalues of S is the open set {a € F : |x| < 1}, and the spectrum of S is the 
closed set {a € F: |a| < 1}. 
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If « is an eigenvalue of an operator T € B(V) and f is an eigenvector of T 
corresponding to «, then 


TAI = all = lal ill. 


which implies that |a| < ||T'||. The next result states that the same inequality holds 
for elements of sp(T). 


10.34 T —al is invertible for |x| large 


Suppose T is a bounded operator on a Banach space. Then 
(a) sp(T) C {a € F: Ja] < ||TII}: 


(b) T — ais invertible for all « € F with |x| > ||T||; 


©) Mien GF cx) |) 1. 


|a|—00 


Proof We begin by proving (b). Suppose w € F and |a| > ||T||. Then 


10.35 fs = a( >) 


Because ||T/«|| < 1, the equation above and 10.22 imply that T — a] is invertible, 
completing the proof of (b). 

Using the definition of spectrum, (a) now follows immediately from (b). 

To prove (c), again suppose a € F and |a| > ||T||. Then 10.35 and 10.22 imply 


1, F* 
En 


k=0 


i ee 


Thus 


I(T -al)*|| < 


/\ 
alr 
M4 


1 
X 


a|— [TI 
The inequality above implies (c), completing the proof. 


The set of eigenvalues of a bounded operator on a Hilbert space can be any 
bounded subset of F, even a nonmeasurable set (see Exercise 3). In contrast, the next 
result shows that the spectrum of a bounded operator is a closed subset of F. This 
result provides one indication that the spectrum of an operator may be a more useful 
set than the set of eigenvalues. 
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10.36 spectrum is closed 


The spectrum of a bounded operator on a Banach space is a closed subset of F. 


Proof Suppose T is a bounded operator on a Banach space V. Suppose «1, &2,... 
is a sequence in sp(T)) that converges to some a € F. Thus each T — aI is not 


invertible and 
lim (T — a,I) = T —al. 
n—- oo 


The set of noninvertible elements of B(V) is a closed subset of B(V) (by 10.25). 
Hence the equation above implies that T — aI is not invertible. In other words, 
a € sp(T), which implies that sp(T) is closed. 


Our next result provides the key tool used in proving that the spectrum of a 
bounded operator on a nonzero complex Hilbert space is nonempty (see 10.38). The 
statement of the next result and the proofs of the next two results use a bit of basic 
complex analysis. Because sp(T) is a closed subset of C (by 10.36), C \ sp(T) is 
an open subset of C and thus it makes sense to ask whether the function in the result 
below is analytic. 

To keep things simple, the next two results are stated for complex Hilbert spaces. 
See Exercise 6 for the analogous results for complex Banach spaces. 


10.37 analyticity of (T — «1)~1 


Suppose T is a bounded operator on a complex Hilbert space V. Then the function 


(owe 


is analytic on C \ sp(T) for every f,g € V. 


Proof Suppose B € C \ sp(T). Then for a € C with |w — B| <1/||(T — BI)“}||, 
we see from 10.22 that I — (a — B)(T — BI)~1 is invertible and 


(1 (e-p)(T- N71) = Lee p)(T — 61). 


Multiplying both sides of the equation above by (T — BI 7 and using the equation 
A~'B~1! = (BA)~! for invertible operators A and B, we get 


foe} 


(T — at)? =} (w— p(T prt). 


k=0 
Thus for f, g € V, we have 


((T—al)""f,g) = YL ((r- BY" 8) (eB) 


The equation above shows that the function # +> ((T —al)"'f, g) has a power se- 
ries expansion as powers of a — 6 for « near 6. Thus this function is analytic near B. 
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A major result in finite-dimensional 
linear algebra states that every operator 
on a nonzero finite-dimensional complex 
vector space has an eigenvalue. We have 
seen examples showing that this result 
does not extend to bounded operators on 
complex Hilbert spaces. However, the 
next result is an excellent substitute. Al- 
though a bounded operator on a nonzero 
complex Hilbert space need not have an eigenvalue, the next result shows that for 
each such operator T, there exists « € C such that T — «I is not invertible. 


The spectrum of a bounded operator 
on a nonzero real Hilbert space can 
be the empty set. This can happen 
even in finite dimensions, where an 


operator on R* might have no 
eigenvalues. Thus the restriction in 
the next result to the complex case 
cannot be removed. 


10.38 spectrum is nonempty 


The spectrum of a bounded operator on a complex nonzero Hilbert space is a 
nonempty subset of C. 


Proof Suppose T € B(V), where V is a complex Hilbert space with V 4 {0}, and 
sp(T) =@. Let f € V with f 4 0. Take g = T~'f in 10.37. Because sp(T) = @, 
10.37 implies that the function 


ars ((T—al)'f,T'f) 


is analytic on all of C. The value of the function above at x = 0 equals the average 
value of the function on each circle in C centered at 0 (because analytic functions 
satisfy the mean value property). But 10.34(c) implies that this function has limit 0 
as |x| —> co. Thus taking the average over large circles, we see that the value of the 
function above at a = 0 is 0. In other words, 


(Tr) =o, 


Hence T~! f = 0. Applying T to both sides of the equation T- f = 0 shows that 
f =, which contradicts our assumption that f 4 0. This contradiction means that 
our assumption that sp(T) = © was false, completing the proof. 


10.39 Definition p(T) 


Suppose T is an operator on a vector space V and p is a polynomial with coeffi- 
cients in F: 
plz) = bo + yz +--+ > + byz". 


Then p(T) is the operator on V defined by 


Doyle yt et ae 


You should verify that if p and q are polynomials with coefficients in F and T is 


an operator, then 
(pq)(T) = p(T) q(T). 
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The next result provides a nice way to compute the spectrum of a polynomial 
applied to an operator. For example, this result implies that if T is a bounded operator 
on a complex Banach space, then the spectrum of T* consists of the squares of all 
numbers in the spectrum of T. 

As with the previous result, the next result fails on real Banach spaces. As you 
can see, the proof below uses factorization of a polynomial with complex coefficients 
as the product of polynomials with degree 1, which is not necessarily possible when 
restricting to the field of real numbers. 


10.40 Spectral Mapping Theorem 


Suppose T is a bounded operator on a complex Banach space and p is a 


polynomial with complex coefficients. Then 


sp(p(T)) = p(sp(T)). 


Proof If p is aconstant polynomial, then both sides of the equation above consist 
of the set containing just that constant. Thus we can assume that p is a nonconstant 
polynomial. 

First suppose a € sp(p(T)). Thus p(T) — aI is not invertible. By the Funda- 
mental Theorem of Algebra, there exist c, 6y,... Bn € C with c £ 0 such that 


10.41 p(z) — « =c(z— Bi) --- (z— Bu) 
for all z € C. Thus 


p(T) — al = c(T — Bil) - +: (T — Bul). 


The left side of the equation above is not invertible. Hence T — B,J is not invertible 
for some k € {1,...,n}. Thus 6; € sp(T). Now 10.41 implies p(f;) = a. Hence 


« € p(sp(T)), completing the proof that sp(p(T)) C p(sp(T)). 

To prove the inclusion in the other direction, now suppose 6 € sp(T). The 
polynomial z ++ p(z) — p(f) has a zero at 6. Hence there exists a polynomial q 
with degree 1 less than the degree of p such that 


p(z) — p(B) = (z— B)aq(z) 
for all z € C. Thus 


10.42 p(T) — p(B)I = (T — BI)q(T) 
and 
10.43 p(T) — p(B) = 4(T)(T — Bl). 


Because T — 1 is not invertible, T — I is not surjective or T — BI is not injective. 
If T — B1 is not surjective, then 10.42 shows that p(T) — p(B) is not surjective. If 
T — 1 is not injective, then 10.43 shows that p(T) — p(B)J is not injective. Either 
way, we see that p(T) — p(B)J is not invertible. Thus p(B) € sp(p(T)), completing 


the proof that sp(p(T)) > p(sp(T)). 
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Self-adjoint Operators 


In this subsection, we look at a nice special class of bounded operators. 


10.44 Definition self-adjoint 


A bounded operator T on a Hilbert space is called self-adjoint if T* = T. 


The definition of the adjoint implies that a bounded operator T on a Hilbert space 
V is self-adjoint if and only if (Tf, ¢) = (f, Tg) for all f,g € V. See Exercise 7 for 
an interesting result regarding this last condition. 


10.45 Example _ self-adjoint operators 


e Suppose bj,b2,... is a bounded sequence in F. Define a bounded operator 
T: SP by 
T (a1, 42,...) = (a1b1,a2b2,...). 
Then T*: ( — @? is the operator defined by 


T* (a1,42,...) = (a1b1,a2b2,...). 
Hence T is self-adjoint if and only if by € R for all k € Z*. 


e More generally, suppose (X,S, 1) is a o-finite measure space andh € L®(p). 
Define a bounded operator M;, € B(L*(w)) by Myf = fh. Then M;,* = Mr. 
Thus My, is self-adjoint if and only if w({x € X : h(x) ¢ R}) =0. 


e Suppose n € Z*, K is an n-by-n matrix, and Zx: F" — F" is the operator of 
matrix multiplication by K (thinking of elements of F” as column vectors). Then 
(Zx)* is the operator of multiplication by the conjugate transpose of K, as shown 
in Example 10.10. Thus Zx is a self-adjoint operator if and only if the matrix K 
equals its conjugate transpose. 


¢ More generally, suppose (X, S, j1) is a o-finite measure space, K € L?(p x p1), 
and Zx is the integral operator on L?(p) defined in Example 10.5. Define 
K*: X x X > F by K*(y,x) = K(x,y). Then (Zx)* is the integral operator 
induced by K*, as shown in Example 10.5. Thus if K* = K, or in other words if 
K(x,y) = K(y,x) for all (x,y) € X x X, then Zx is self-adjoint. 


Suppose U is a closed subspace of a Hilbert space V. Recall that Pjy denotes the 
orthogonal projection of V onto U (see Section 8B). We have 


(Puf,g) = (Puf, Pug + (I — Pu)g) 
= (Puf, Pug) 
= (f — (I— Pu)f, Pug) 
= (f, Pug), 


where the second and fourth equalities above hold because of 8.37(a). The 
equation above shows that Pi; is a self-adjoint operator. 
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For real Hilbert spaces, the next result requires the additional hypothesis that T 
is self-adjoint. To see that this extra hypothesis cannot be eliminated, consider the 
operator T: R? —+ R* defined by T(x,y) = (—y,x). Then, T 4 0, but with the 
standard inner product on R?, we have (Tf, f) = 0 for all f € R? (which you can 
verify either algebraically or by thinking of T as counterclockwise rotation by a right 
angle). 


10.46 (Tf, f) =Ofor all f implies T = 0 


Suppose V is a Hilbert space, T € B(V), and (Tf, f) =0 for all f € V. 
(a) IfF = C, then T = 0. 
(b) If F = Rand T is self-adjoint, then T = 0. 


Proof First suppose F = C. If g,h € V, then 


(Tg,h) = (T(g +h),g +h) : (T(g—h),g —h) 


as can be verified by computing the right side. Our hypothesis that (Tf, f) = 0 
for all f € V implies that the right side above equals 0. Thus (Tg,h) = 0 for all 
gh € V. Taking h = Tg, we can conclude that T = 0, which completes the proof 
of (a). 

Now suppose F = R and T is self-adjoint. Then 


aipeag (vgn) = Pith) th) ; set), 
this is proved by computing the right side using the equation 


(Th, g) = (h, Tg) = (Tg,h), 


where the first equality holds because T is self-adjoint and the second equality holds 
because we are working in a real Hilbert space. Each term on the right side of 10.47 
is of the form (Tf, f) for appropriate f. Thus (T¢,h) = 0 for all g,h € V. This 
implies that T = 0 (take h = Tg), completing the proof of (b). 


Some insight into the adjoint can be obtained by thinking of the operation T +> T* 
on 6(V) as analogous to the operation z +» Zon C. Under this analogy, the 
self-adjoint operators (characterized by T* = T) correspond to the real numbers 
(characterized by Z = z). The first two bullet points in Example 10.45 illustrate this 
analogy, as we saw that a multiplication operator on L?(p) is self-adjoint if and only 
if the multiplier is real-valued almost everywhere. 

The next two results deepen the analogy between the self-adjoint operators and 
the real numbers. First we see this analogy reflected in the behavior of (Tf, f), and 
then we see this analogy reflected in the spectrum of T. 
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10.48 self-adjoint characterized by (Tf, f) 


Suppose T is a bounded operator on a complex Hilbert space V. Then T is 
self-adjoint if and only if 
Clg Sus 


for all f € V. 


Proof Let f € V. Then 


led = Viel ed) = aed ela Ea it 


If (Tf, f) € R for every f € V, then the left side of the equation above equals 0, so 
((T — T*)f, f) = 0 for every f € V. This implies that T— T* = 0 [by 10.46(a)]. 
Hence T is self-adjoint. 

Conversely, if T is self-adjoint, then the right side of the equation above equals 
0, so (Tf, f) = (Tf, f) for every f € V. This implies that (Tf, f) € R for every 
f € V,as desired. 


10.49 self-adjoint operators have real spectrum 


Suppose T is a bounded self-adjoint operator on a Hilbert space. Then 
sp(T) CR. 


Proof The desired result holds if F = R because the spectrum of every operator on 
a real Hilbert space is, by definition, contained in R. 

Thus we assume that T is a bounded operator on a complex Hilbert space V. 
Suppose «, B € R, with 6 4 0. If f € V, then 


I(T — @ + BANFF = (7 - (+ BIDS) | 
= (TF, f) — allfll? — BILFA?é| 
> |BI If’, 


where the first inequality comes from the Cauchy—Schwarz inequality (8.11) and the 
last inequality holds because (Tf, f) — «|| f ||? € R (by 10.48). 
The inequality above implies that 


J 
IB| 


for all f € V. Now the equivalence of (a) and (b) in 10.29 shows that T — (a + fi)I 
is left invertible. 

Because T is self-adjoint, the adjoint of T — (« + Bi)I is T — (a — Bi)I, which 
is left invertible by the same argument as above (just replace B by —f). Hence 
T — (a + i)! is right invertible (because its adjoint is left invertible). Because the 
operator T — (a + fi)! is both left and right invertible, it is invertible. In other words, 
a + Bi ¢ sp(T). Thus sp(T) C R, as desired. 


fll S Fa(T- @ + BOYS 
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We showed that a bounded operator on a complex nonzero Hilbert space has a 
nonempty spectrum. That result can fail on real Hilbert spaces (where by definition 
the spectrum is contained in R). For example, the operator T on R? defined by 
T(x,y) = (—y,x) has empty spectrum. However, the previous result and 10.38 can 
be used to show that every self-adjoint operator on a nonzero real Hilbert space has 
nonempty spectrum (see Exercise 9 for the details). 

Although the spectrum of every self-adjoint operator is nonempty, it is not true that 
every self-adjoint operator has an eigenvalue. For example, the self-adjoint operator 
Mx € B(L?((0,1])) defined by (Mxf) (x) = xf (x) has no eigenvalues. 


Normal Operators 


Now we consider another nice special class of operators. 


10.50 Definition normal operator 


A bounded operator T on a Hilbert space is called normal if it commutes with its 


adjoint. In other words, T is normal if 


ia ue — elias 


Clearly every self-adjoint operator is normal, but there exist normal operators that 
are not self-adjoint, as shown in the next example. 


10.51 Example normal operators 


e Suppose ji is a positive measure, h € L(y), and M;, € B(L?(y)) is the 
multiplication operator defined by M;,f = fh. Then M;,* = M;, which means 
that M), is self-adjoint if 1 is real valued. If F = C, then h can be complex 
valued and M,, is not necessarily self-adjoint. However, 


M,* My, = M2 _ M,M),* 


and thus M;, is anormal operator even when h is complex valued. 


Suppose T is the operator on F* whose matrix with respect to the standard basis 


is 
2 -3 
3 2 : 

Then T is not self-adjoint because the matrix above is not equal to its conjugate 


transpose. However, T*T = 13] and TT* = 13], as you should verify. Because 
T*T = TT* we conclude that T is a normal operator. 


10.52 Example = an operator that is not normal 

Suppose T is the right shift on ¢?; thus T(a1,42,...) = (0,44,42,...). Then T* 
is the left shift: T*(a1,a2,...) = (az,43,...). Hence T*T is the identity operator 
on ¢? and TT* is the operator (a1, @2,43,...) +> (0,42, 43,...). Thus T*T A TT*, 
which means that T is not a normal operator. 
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10.53 normal in terms of norms 


Suppose T is a bounded operator on a Hilbert space V. Then T is normal if and 


only if 
TAI = (T"Fl 


for all f € V. 


Proof If f € V, then 
ITF? — IT*FP = (TF. TF) — (TPT) = (T°T — TT"), f). 


If T is normal, then the right side of the equation above equals 0, which implies that 
the left side also equals 0 and hence ||Tf|| = ||T*f||. 

Conversely, suppose ||Tf|| = |/T*f|| for all f € V. Then the left side of the 
equation above equals 0, which implies that the right side also equals 0 for all f € V. 
Because T*T — TT™* is self-adjoint, 10.46 now implies that T*T — TT* = 0. Thus 
T is normal, completing the proof. 


Each complex number can be written in the form a + bi, where a and D are real 
numbers. Part (a) of the next result gives the analogous result for bounded operators 
on a complex Hilbert space, with self-adjoint operators playing the role of real 
numbers. We could call the operators A and B in part (a) the real and imaginary parts 
of the operator T. Part (b) below shows that normality depends upon whether these 
real and imaginary parts commute. 


10.54 operator is normal if and only if its real and imaginary parts commute 


Suppose T is a bounded operator on a complex Hilbert space V. 


(a) There exist unique self-adjoint operators A, B on V such that T = A + 1B. 


(b) T is normal if and only if AB = BA, where A, B are as in part (a). 


Proof Suppose T = A + iB, where A and B are self-adjoint. Then T* = A — iB. 
Adding these equations for T and T* and then dividing by 2 produces a formula for 
A; subtracting the equation for T* from the equation for T and then dividing by 2 
produces a formula for B. Specifically, we have 


which proves the uniqueness part of (a). The existence part of (a) is proved by 
defining A and B by the equations above and noting that A and B as defined above 
are self-adjoint and T = A+ 1B. 
To prove (b), verify that if A and B are defined as in the equations above, then 
T*T — TT 
21 , 


AB-BA= 


Thus AB = BA if and only if T is normal. 
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An operator on a finite-dimensional vector space is left invertible if and only if 
it is right invertible. We have seen that this result fails for bounded operators on 
infinite-dimensional Hilbert spaces. However, the next result shows that we recover 
this equivalency for normal operators. 


10.55 invertibility for normal operators 


Suppose V is a Hilbert space and T € B(V) is normal. Then the following are 
equivalent: 


(a) T is invertible. 
(b) T is left invertible. 
T is right invertible. 
T is surjective. 
T is injective and has closed range. 
T*T is invertible. 


TI™ is invertible. 


Proof Because T is normal, (f) and (g) are clearly equivalent. From 10.29, we know 
that (f), (b), and (e) are equivalent to each other. From 10.31, we know that (g), 
(c), and (d) are equivalent to each other. Thus (b), (c), (d), (e), (f), and (g) are all 
equivalent to each other. 

Clearly (a) implies (b). 

Suppose (b) holds. We already know that (b) and (c) are equivalent; thus T is left 
invertible and T is right invertible. Hence T is invertible, proving that (b) implies (a) 
and completing the proof that (a) through (g) are all equivalent to each other. 


The next result shows that a normal operator and its adjoint have the same eigen- 
vectors, with eigenvalues that are complex conjugates of each other. This result can 
fail for operators that are not normal. For example, 0 is an eigenvalue of the left shift 
on ( but its adjoint the right shift has no eigenvectors and no eigenvalues. 


10.56 T normal and Tf = wf implies T* f = xf 


Suppose T is a normal operator on a Hilbert space V, « € F, and f € V. Then a 


is an eigenvalue of T with eigenvector f if and only if & is an eigenvalue of T* 
with eigenvector f. 


Proof Because (T — a1)* = T* — @I and T is normal, T — aI commutes with its 
adjoint. Thus T — wI is normal. Hence 10.53 implies that 


I(T — al) fll = \(T" — an) fll. 
Thus (T — aI) f = O if and only if (T* — #1) f = 0, as desired. 
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Because every self-adjoint operator is normal, the following result also holds for 
self-adjoint operators. 


10.57 orthogonal eigenvectors for normal operators 


Eigenvectors of a normal operator corresponding to distinct eigenvalues are 
orthogonal. 


Proof Suppose « and f are distinct eigenvalues of a normal operator T, with 
corresponding eigenvectors f and g. Then 10.56 implies that T* f = af. Thus 


(B—a)(g, f) = (Bg, f) — (g,@f) = (Ts, f) — (g, Tf) = 0. 


Because a # B, the equation above implies that (g, f) = 0, as desired. 


Isometries and Unitary Operators 


10.58 Definition isometry, unitary operator 


Suppose T is a bounded operator on a Hilbert space V. 


e T is called an isometry if ||T f || = || f|| for every f € V. 


e T is called unitary if T*T = TT* =I. 


10.59 Example isometries and unitary operators 


e Suppose T € B(¢*) is the right shift defined by 
T(a, a7,A3,.. .) = (0, a1,a2,03,.. i), 


Then T is an isometry but is not a unitary operator because TT* £ I (as is clear 
without even computing T* because T is not surjective). 


Suppose T € B(0?(Z)) is the right shift defined by 
(Tf)(n) = f(n—1) 


for f: Z 4 Fwith >? | f(k)|? < oo. Then T is an isometry and is unitary. 


Suppose bj, bz,... is a bounded sequence in F. Define T € B(¢?) by 
T (a1, a2,...) = (a,b), azb2,...). 


Then T is an isometry if and only if T is unitary if and only if |b,| = 1 for all 
keZt, 


e More generally, suppose (X,S, 1) is a o-finite measure space andh € L®(p). 
Define M;, € B(L?(1)) by Myf = fh. Then T is an isometry if and only if T 
is unitary if and only if w({x € X: |h(x)| A 1}) =0. 
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By definition, isometries preserve norms. The equivalence of (a) and (b) in the 
following result shows that isometries also preserve inner products. 


10.60 isometries preserve inner products 


Suppose T is a bounded operator on a Hilbert space V. Then the following are 
equivalent: 


(a) T is an isometry. 
(Oy eis) tor alle Vv. 
@) Fer = 1, 


(d) {Te;,},er is an orthonormal family for every orthonormal family {e,},er 
in V. 


(e) {Te,},er is an orthonormal family for some orthonormal basis {e,},er 
of V. 


Proof Iff € V, then 


ITF? — WAI? = (TATA) — fF) = (IT — DF, f)- 
Thus ||Tf|| = ||f|| for all f € V if and only if the right side of the equation above 
is O for all f € V. Because T*T — I is self-adjoint, this happens if and only if 
T*T — I =0 (by 10.46). Thus (a) is equivalent to (c). 

If T*T = I, then (Tf,Tg) = (T*Tf,g) = (f,g) for all f,g € V. Thus (c) 
implies (b). 

Taking g = f in (b), we see that (b) implies (a). Hence we now know that (a), (b), 
and (c) are equivalent to each other. 

To prove that (b) implies (d), suppose (b) holds. If {e,}xer is an orthonormal 
family in V, then (Te;, Tex) = (e;,ex) for all j,k € T, and thus {Te,}xer is an 
orthonormal family in V. Hence (b) implies (d). 

Because V has an orthonormal basis (see 8.67 or 8.75), (d) implies (e). 

Finally, suppose (e) holds. Thus {Te,},er is an orthonormal family for some 
orthonormal basis {e,},cr of V. Suppose f € V. Then by 8.63(a) we have 

f= Ahee 
jer 
which implies that 
jer 
Thus if k € T, then 


CE°TT =) ey tee = Le ea ee) — 
je jet 
where the last equality holds because (Te;, Te.) equals 1 if j = k and equals 0 
otherwise. Because the equality above holds for every e, in the orthonormal basis 
{ex}ker, we conclude that T*T f = f. Thus (e) implies (c), completing the proof. 
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The equivalence between (a) and (c) in the previous result shows that every unitary 
operator is an isometry. 

Next we have a result giving conditions that are equivalent to being a unitary 
operator. Notice that parts (d) and (e) of the previous result refer to orthonormal 
families, but parts (f) and (g) of the following result refer to orthonormal bases. 


10.61 unitary operators and their adjoints are isometries 


Suppose T is a bounded operator on a Hilbert space V. Then the following are 
equivalent: 


(a) T is unitary. 

(b) T is a surjective isometry. 
T and T* are both isometries. 
T* is unitary. 
T is invertible and T~! = T*. 


{ Te, },er is an orthonormal basis of V for every orthonormal basis {e, }ker 
of V. 


{Tex }er is an orthonormal basis of V for some orthonormal basis {ex }xer 
of V. 


Proof The equivalence of (a), (d), and (e) follows easily from the definition of 
unitary. 

The equivalence of (a) and (c) follows from the equivalence in 10.60 of (a) and (c). 

To prove that (a) implies (b), suppose (a) holds, so T is unitary. As we have 
already noted, this implies that T is an isometry. Also, the equation TT* = I implies 
that T is surjective. Thus (b) holds, proving that (a) implies (b). 

Now suppose (b) holds, so T is a surjective isometry. Because T is surjective and 
injective, T is invertible. The equation T*T = I [which follows from the equivalence 
in 10.60 of (a) and (c)] now implies that T~! = T* Thus (b) implies (e). Hence at 
this stage of the proof, we know that (a), (b), (c), (d), and (e) are all equivalent to 
each other. 

To prove that (b) implies (f), suppose (b) holds, so T is a surjective isometry. 
Suppose {e;}xer is an orthonormal basis of V. The equivalence in 10.60 of (a) and (d) 
implies that { Te, },ep is an orthonormal family. Because {e,},er is an orthonormal 
basis of V and T is surjective, the closure of the span of {Te,};cr equals V. Thus 
{Tex }-cr is an orthonormal basis of V, which proves that (b) implies (f). 

Obviously (f) implies (g). 

Now suppose (g) holds. The equivalence in 10.60 of (a) and (e) implies that T 
is an isometry, which implies that the range of T is closed. Because {Tex }xer is an 
orthonormal basis of V, the closure of the range of T equals V. Thus T is a surjective 
isometry, proving that (g) implies (b) and completing the proof that (a) through (g) 
are all equivalent to each other. 
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The equations T*T = TT* = I are analogous to the equation |z|* = 1 for z € C. 
We now extend this analogy to the behavior of the spectrum of a unitary operator. 


10.62 spectrum of a unitary operator 


Suppose T is a unitary operator on a Hilbert space. Then 


sp(T) C {# EF: |a| = 1}. 


Proof Suppose a € F with |a| A 1. Then 
(T — w1)*(T — al) = (T* —@I1)(T — al) 


= (1+ |a|*)I — (aT* +27) 


aT* + 0T 
10.63 = (1+ |al’)(1- =). 
Looking at the last term in parentheses above, we have 
aT* +nT 2|a| 
10.64 | |< <1, 
1+ |a|? 1+ |a|? 


where the last inequality holds because |a| 4 1. Now 10.64, 10.63, and 10.22 imply 
that (T — wI)*(T — aI) is invertible. Thus T — aI is left invertible. Because T — a1 
is normal, this implies that T — «I is invertible (see 10.55). Hence a ¢ sp(T). Thus 
sp(T) C {a € F: |a| = 1}, as desired. 


As a special case of the next result, we can conclude (without doing any calcula- 
tions!) that the spectrum of the right shift on 0? is {a € F: |a| < 1}. 


10.65 spectrum of an isometry 


Suppose T is an isometry on a Hilbert space and T is not unitary. Then 


SO) = ee 185 |e) 


Proof Because T is an isometry but is not unitary, we know that T is not surjective 
[by the equivalence of (a) and (b) in 10.61]. In particular, T is not invertible. Thus 
T* is not invertible. 

Suppose a € F with |a| < 1. Because T*T = I, we have 


T*(T -—al) =I-al™. 


The right side of the equation above is invertible (by 10.22). If T — I were invertible, 
then the equation above would imply T* = (I — wT*)(T — aI)~1, which would 
make T* invertible as the product of invertible operators. However, the paragraph 
above shows T™ is not invertible. Thus T — aI is not invertible. Hence a € sp(T). 

Thus {a € F: |a| < 1} C sp(T). Because sp(T) is closed (see 10.36), this 
implies {a € F : |x| < 1} C sp(T). The inclusion in the other direction follows 
from 10.34(a). Thus sp(T) = {a € F: |a| < 1}. 
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EXERCISES 10B 


Verify all the assertions in Example 10.33. 

Suppose T is a bounded operator on a Hilbert space V. 

(a) Prove that sp(S~!TS) = sp(T) for all bounded invertible operators S on V. 
(b) Prove that sp(T*) = {@: a € sp(T)}. 

(c) Prove that if T is invertible, then sp(T~!) = {1: « € sp(T)}. 

Suppose E is a bounded subset of F. Show that there exists a Hilbert space V 
and T € B(V) such that the set of eigenvalues of T equals E. 


Suppose E is a nonempty closed bounded subset of F. Show that there exists 
T € B(é) such that sp(T) = E. 


Give an example of a bounded operator T on a normed vector space such that 
for every « € F, the operator T — aI is not invertible. 


Suppose T is a bounded operator on a complex nonzero Banach space V. 


(a) Prove that the function 


a++ ~((T—-al)'f) 
is analytic on C \ sp(T) for every f € V and every g € V’. 
(b) Prove that sp(T) 4 ©. 


Prove that if T is an operator on a Hilbert space V such that (Tf, ¢) = (f,Tg) 
for all f,g € V, then T is a bounded operator. 


Suppose P is a bounded operator on a Hilbert space V such that P? = P. Prove 
that P is self-adjoint if and only if there exists a closed subspace U of V such 
that P = Py. 


Suppose V is areal Hilbert space and T € B(V). The complexification of T is 
the function Tc: Vc — Vc defined by 


Tc(f tig) = Tf +iTg 
for f,g € V (see Exercise 4 in Section 8B for the definition of Vc). 
(a) Show that T¢ is a bounded operator on the complex Hilbert space Vc and 
I|Tcll = ITI. 
(b) Show that T¢ is invertible if and only if T is invertible. 
(c) Show that (Tc)* = (T*)c. 
(d) Show that T is self-adjoint if and only if T¢ is self-adjoint. 


(e) Use the previous parts of this exercise and 10.49 and 10.38 to show that if 
T is self-adjoint and V # {0}, then sp(T) 4 ©. 
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Suppose T is a bounded operator on a Hilbert space V such that (Tf, f) > 0 
for all f € V. Prove that sp(T) C [0,00). 


Suppose P is a bounded operator on a Hilbert space V such that P? = P. Prove 
that P is self-adjoint if and only if P is normal. 


Prove that a normal operator on a separable Hilbert space has at most countably 
many eigenvalues. 


Prove or give a counterexample: If T is a normal operator on a Hilbert space and 
T = A+iB, where A and B are self-adjoint, then ||T|| = \/||A||* + ||B]l?. 


A number « € F is called an approximate eigenvalue of a bounded operator T 
on a Hilbert space V if 


14 


15 


16 


17 


18 


19 


20 


21 


inf{||(T — al) fll : f € Vand ||f|| =1} =0. 


Suppose T is a normal operator on a Hilbert space and « € F. Prove that 
a € sp(T) if and only if « is an approximate eigenvalue of T. 


Suppose T is a normal operator on a Hilbert space. 


(a) Prove that if w is an eigenvalue of T, then |a|* is an eigenvalue of T*T. 
(b) Prove that if « € sp(T), then |a|? € sp(T*T). 


Suppose {e,};¢z+ is an orthonormal basis of a Hilbert space V. Suppose also 
that T is a normal operator on V and e; is an eigenvector of T for every k > 2. 
Prove that e; is an eigenvector of T. 


Prove that if T is a self-adjoint operator on a Hilbert space, then ||T”|| = || T'||” 
for every n € Z*. 


Prove that if T is a normal operator on a Hilbert space, then ||T"|| = ||T||" for 
every n € Zt. 


Suppose T is an invertible operator on a Hilbert space. Prove that T is unitary if 
and only if ||T|| = ||T~+|| = 1. 


Suppose T is a bounded operator on a complex Hilbert space, with T = A + 1B, 
where A and B are self-adjoint (see 10.54). Prove that T is unitary if and only if 
T is normal and A? + B* = I. 

fz =x+yi, where x,y € R, then |z| = 1 ifand only if x* + y? = 1. Thus 
this exercise strengthens the analogy between the unit circle in the complex 
plane and the unitary operators. | 


Suppose T is a unitary operator on a complex Hilbert space such that T — I is 
invertible. Prove that 

cer =e 
is a self-adjoint operator. 
[The function z + i(z +1)(z —1)7! maps {z € C: |z| = 1} \ {1} toR. 
Thus this exercise provides another useful illustration of the analogies showing 
unitary © {z € C: |z| = 1} and self-adjoint © R.] 
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22 Suppose T is a self-adjoint operator on a complex Hilbert space. Prove that 
(Taf a1) 


is a unitary operator. 

[The function z +> (z +i)(z—i)~! maps R to {z € C: |z| = 1} \ {1}. 
Thus this exercise provides another useful illustration of the analogies showing 
(a) unitary = > {z €C: |z| =1}; () self-adjoint —> R.] 


For T a bounded operator on a Banach space, define e by 
fo} Tk 
TP 
oe mk 
k=0 


23 (a) Prove that if T is a bounded operator on a Banach space V, then the infinite 
sum above converges in B(V) and lle" | < ellTll, 


(b) Prove that if S,T are bounded operators on a Banach space V such that 
Shia, henge 
(c) Prove that if T is a self-adjoint operator on a complex Hilbert space, then 


e! is unitary. 


A bounded operator T on a Hilbert space is called a partial isometry if 
IT Fl| = If ll for all f € (null T)~. 


24 Suppose (X,S, 1) is a v-finite measure space and h € L®(y). As usual, let 
My, € B(L?(#)) denote the multiplication operator defined by Mj, f = fh. 
Prove that Mj, is a partial isometry if and only if there exists a set E € S such 
that h = x,. 


25 Suppose T is an isometry on a Hilbert space. Prove that T* is a partial isometry. 


26 Suppose T is a bounded operator on a Hilbert space V. Prove that T is a partial 
isometry if and only if T*T = Py for some closed subspace U of V. 
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10C Compact Operators 


The Ideal of Compact Operators 


A rich theory describes the behavior of compact operators, which we now define. 


10.66 Definition compact operator 


e An operator T on a Hilbert space V is called compact if for every bounded 
sequence fy, f2,... in V, the sequence Tf;,Tf2,... has a convergent 
subsequence. 


e The collection of compact operators on V is denoted by C(V). 


The next result provides a large class of examples of compact operators. We will 
see more examples after proving a few more results. 


10.67 bounded operators with finite-dimensional range are compact 


If T is a bounded operator on a Hilbert space and range T is finite-dimensional, 
then T is compact. 


Proof Suppose T is a bounded operator on a Hilbert space V and range T is 
finite-dimensional. Suppose 1, .. .,€ is an orthonormal basis of range T (a finite 
orthonormal basis of range T exists because the Gram—Schmidt process applied to 
any basis of range T produces an orthonormal basis; see the proof of 8.67). 

Now suppose f;, f2,... is a bounded sequence in V. For each n € Zt, we have 


Thn = (Tfns€1)e1 a oe (Tha €m)em- 
The Cauchy—Schwarz inequality shows that |(Tfn,e;)| < ||T|| sup || fx|| for every 
keZt 


n€ Zt andj € {1,...,m}. Thus there exists a subsequence fins fy, -- . Such that 
limj—oo(T fny, ej) exists in F for each j € {1,...,m}. The equation displayed above 
now implies that limj_,.. Tf, exists in V. Thus T is compact. 


Not every bounded operator is compact. For example, the identity map on an 
infinite-dimensional Hilbert space is not compact (to see this, consider an orthonormal 
sequence, which does not have a convergent subsequence because the distance 
between any two distinct elements of the orthonormal sequence is V2). 


10.68 compact operators are bounded 


Every compact operator on a Hilbert space is a bounded operator. 


Proof We show that if T is an operator that is not bounded, then T is not compact. 
To do this, suppose V is a Hilbert space and T is an operator on V that is not bounded. 
Thus there exists a bounded sequence f}, f2,... in V such that limy—+0||T fn|| = 0°. 
Hence no subsequence of Tf, T fo,... converges, which means T is not compact. 
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If V is a Hilbert space, then a two- 
sided ideal of B(V) is a subspace of 
B(V) that is closed under multiplication 
on either side by bounded operators on V. 
The next result states that the set of com- 
pact operators on V is a two-sided ideal 
of B(V) that is closed in the topology on 
B(V) that comes from the norm. 


If V is finite-dimensional, then the 
only two-sided ideals of B(V) are 
{0} and B(V). In contrast, if V is 


infinite-dimensional, then the next 
result shows that B(V) has a closed 
two-sided ideal that is neither {0} 
nor V. 


10.69 C(V) is aclosed two-sided ideal of B(V) 


Suppose V is a Hilbert space. 


(a) C(V) is a closed subspace of B(V). 
(b) If T € C(V) and S € B(V), then ST € C(V) and TS € C(V). 


Proof Suppose fj, f2,... is a bounded sequence in V. 

To prove that C(V) is closed under addition, suppose S,T € C(V). Because 
S is compact, Sf,Sfo,... has a convergent subsequence Sfn,,Sfny,.... Because 
T is compact, some subsequence of Tf,,,Tfn,,... converges. Thus we have a 
subsequence of (S + T) f,,(S+T) fo,... that converges. Hence S+ T € C(V). 

The proof that C(V) is closed under scalar multiplication is easier and is left to 
the reader. Thus we now know that C(V) is a subspace of B(V). 

To show that C(V) is closed in B(V), suppose T € B(V) and there is a sequence 
Ti, Tz,... in C(V) such that limyn—00||T — Tim|| = 0. To show that T is compact, we 
need to show that Tfn,,Tfny,-.. is a Cauchy sequence for some increasing sequence 
of positive integers ny <2 <-::-. 

Because T; is compact, there is an infinite set Z; C Z* with ||T, fj — Ti fx|| <1 
for all j,k € Z,. Let n1 be the smallest element of Z;. 

Now suppose m € Z* with m > 1 and an infinite set Z_; C Z* and 
Nm—1 € Zm—1 have been chosen. Because T;, is compact, there is an infinite set 
Zm C Zm—1 with 

Il Tn fj = Tn fell < i 
for all j,k € Zm. Let nm be the smallest element of Z,, such that ny > Ny—1. 

Thus we produce an increasing sequence n; < n2 < --- of positive integers and 
a decreasing sequence Z, D Z D --- of infinite subsets of Z*. 

If m € Z* and j,k > m, then 


\IT fn; — Tfngll S I Tfnj — Tmfnjll + | Tn; — Tm fing ll + || Tefr, — Ting l 
< [TT (I fajll + Wfngll) + i- 


We can make the first term on the last line above as small as we want by choosing 
m large (because limy—-+00||T — Ti|| = 0 and the sequence fy, f2,... is bounded). 
Thus T tiny T fins ...1S a Cauchy sequence, as desired, completing the proof of (a). 
To prove (b), suppose T € C(V) and S € B(V). Hence some subsequence of 
T f,, T fo,... converges, and applying S to that subsequence gives another convergent 
sequence. Thus ST € C(V). Similarly, Sf,,Sfy,... is a bounded sequence, and thus 
T(Sf1),T(Sf2),..- has a convergent subsequence; thus TS € C(V). 
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The previous result now allows us to see many new examples of compact operators. 


10.70 compact integral operators 


Suppose (X,S, 11) is a -finite measure space, K € L*(p x p1), and Tx is the 
integral operator on L?(1) defined by 


(x f)(x) = [K(x y) FW) duly) 


for f € L*(p) and x € X. Then Tx is a compact operator. 


Proof Example 10.5 shows that Zx is a bounded operator on L?(11). 
First consider the case where there exist ¢, € L?(p) such that 


10.71 K(x,y) = g(x)h(y) 
for almost every (x,y) € X x X. In that case, if f € L?(j) then 


(Zxf)(x) = I g(x)h(y) f(y) duly) = (fh) g(x) 


for almost every x € X. Thus Zxf = (f,)g. In other words, Zx has a one- 
dimensional range in this case (or a zero-dimensional range if g = 0). Hence 10.67 
implies that Zx is compact. 

Now consider the case where K is a finite sum of functions of the form given by 
the right side of 10.71. Then because the set of compact operators on V is closed 
under addition [by 10.69(a)], the operator Zx is compact in this case. 

Next, consider the case of K € L?(j: x 1) such that K is the limit in L?(y x 1) 
of a sequence of functions K;, K2,..., each of which is of the form discussed in the 
previous paragraph. Then 


\|Zx — Tx, |] = |Zx-K, |] < |K — Kull. 


where the inequality above comes from 10.8. Thus Zx = limn-0Zx,. By the 
previous paragraph, each Zx,, is compact. Because the set of compact operators is a 
closed subset of B(V) [by 10.69(a)], we conclude that Zx is compact. 

We finish the proof by showing that the case considered in the previous paragraph 
includes all K € L*(y x 1). To do this, suppose F € L*(p x 2) is orthogonal to all 
the elements of L?(j x j1) of the form considered in the previous paragraph. Thus 


0= fs nyFy) dyxny(xy) = fax) [ WyFy) dn(y) du(x) 


for all g,h € L?() where we have used Tonelli’s Theorem, Fubini’s Theorem, and 
Hilder’s inequality (with p = 2). For fixed h € L(y), the right side above equalling 
0 for all g € L?(y) implies that 


[ MME a) duly) = 0 


for almost every x € X. Now F(x,y) = 0 for almost every (x,y) € X x X [because 
the equation above holds for all h € i) #)], which by 8.42 completes the proof. 
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As a special case of the previous result, we can now see that the Volterra operator 
Vv: L*([0,1]) > L?((0,1]) defined by 


WA) = fF 


is compact. This holds because, as shown in Example 10.15, the Volterra operator is 
an integral operator of the type considered in the previous result. 

The Volterra operator is injective [because differentiating both sides of the equation 
the f = 0 with respect to x and using the Lebesgue Differentiation Theorem (4.19) 
shows that f = 0]. Thus the Volterra operator is an example of a compact operator 
with infinite-dimensional range. The next example provides another class of compact 
operators that do not necessarily have finite-dimensional range. 


10.72 Example compact multiplication operators on 


Suppose b, b2,... is a sequence in F such that lim; 500 Db, = 0. Define a bounded 
linear map T: (2 > @ by 


T(a1,42,...) = (a1by,a2b2,...) 
and for n € Zt, define a bounded linear map T,: ¢2 — ¢? by 
Tn (41,42, ae 2) = (a,b, agbo, bac ,aynby,0,0, Soe )é 


Note that each T;, is a bounded operator with finite-dimensional range and thus is 
compact (by 10.67). The condition limy— 00 by, = 0 implies that limy—+co T, = T. 
Thus T is compact because C(V) is a closed subset of B(V) [by 10.69(a)]. 


The next result states that an operator is compact if and only if its adjoint is 
compact. 


10.73 T compact <> 1T* compact 


Suppose T is a bounded operator on a Hilbert space. Then T is compact if and 
only if T* is compact. 


Proof First suppose T is compact. We want to prove that T* is compact. To do 
this, suppose fy, f2,... is a bounded sequence in V. Because TT* is compact [by 
10.69(b)], some subsequence TT* f,,,, TT* fny,... converges. Now 


||P" fn, ~ T* fngll? = (T* (fn, — fade T fry — frz)) 
— (TT* (fn — fg) fn, — fn) 
< TT" Fj — fg) Mh fn — frog l- 


The inequality above implies that T* f,,, T* fuy,...18 a Cauchy sequence and hence 
converges. Thus T* is a compact operator, completing the proof that if T is compact, 
then T* is compact. 

Now suppose T* is compact. By the result proved in the paragraph above, (T*)* 
is compact. Because (T*)* = T (see 10.11), we conclude that T is compact. 
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Spectrum of Compact Operator and Fredholm Alternative 


We noted earlier that the identity map on an infinite-dimensional Hilbert space is not 
compact. The next result shows that much more is true. 


10.74 no infinite-dimensional closed subspace in range of compact operator 


The range of each compact operator on a Hilbert space contains no infinite- 
dimensional closed subspaces. 


Proof Suppose T is a bounded operator on a Hilbert space V and U is an infinite- 
dimensional closed subspace contained in range T. We want to show that T is not 
compact. 

Because T is a continuous operator, T~1(Uu) is a closed subspace of V. Let 
S= Tlraquy: Thus S is a surjective bounded linear map from the Hilbert space 


T~1(U) onto the Hilbert space U [here T~!(U) and U are Hilbert spaces by 6.16(b)]. 
The Open Mapping Theorem (6.81) implies S maps the open unit ball of T~!( U) to 
an open subset of U. Thus there exists r > 0 such that 

10.75 {g EU: |lgl| <r} c {Tf : f © T1(U) and ||f || < 1}. 

Because U is an infinite-dimensional Hilbert space, there exists an orthonormal 
sequence ¢€1,@,... in U, as can be seen by applying the Gram—Schmidt process (see 
the proof of 8.67) to any linearly independent sequence in U. Each is in the left 
side of 10.75. Thus for each n € Z*, there exists fn € T~!(U) such that || fn] <1 
and Tf, = =. The sequence fy, f2,... is bounded, but the sequence Tf,, T fo,.. . 

rex | _ er 
2 2 


re; 
has no convergent subsequence because | S for ] # k. Thus T is 


not compact, as desired. 


Suppose T is a compact operator on an infinite-dimensional Hilbert space. The 
result above implies that T is not surjective. In particular, T is not invertible. Thus 
we have the following result. 


10.76 compact implies not invertible on infinite-dimensional Hilbert spaces 


If T is a compact operator on an infinite-dimensional Hilbert space, then 


0 € sp(T). 


Although 10.74 shows that if T is compact then range T contains no infinite- 
dimensional closed subspaces, the next result shows that the situation differs drasti- 
cally for T — al if a € F \ {0}. 

The proof of the next result makes use of the restriction of T — aI to the closed 
subspace (null(T —al )) *. As motivation for considering this restriction, recall that 
each f € V can be written uniquely as f = g +h, where g € null(T — a1) and 
he (null(T — al))~ (see 8.43). Thus (T — wl) f = (T — aI)h, which implies that 


range(T — al) = (T — al)((null(T — al))~). 


Section 10C Compact Operators 317 


10.77 closed range 


If T is a compact operator on a Hilbert space, then T — aI has closed range for 
every a € F witha £0. 


Proof Suppose T is a compact operator on a Hilbert space V and w € F is such that 


a #0. 
10.78 Claim: there exists r > 0 such that 
ILFll <rl|(T — a1) f|| for all f € (null(T — al))*. 


To prove the claim above, suppose it is false. Then for each n € Z*, there exists 
fn € (null(T - al))~ such that 


I|fal| =1 and ||(T—al) full < ¢- 
Because T is compact, there exists a subsequence T fy,,, T fn, ... Such that 
10.79 jim. Th, =8 
for some g € V. Subtracting the equation 


10.80 lim (T — al) fn, =0 


k-o0 
from 10.79 and then dividing by a shows that 


lim =1lg. 
jim fr, aS 


The equation above implies ||g|| = |«|; hence g 4 0. Each fy, € (null(T — al))~; 
hence we also conclude that g € (null(T —al )) Applying T — aI to both sides of 
the equation above and using 10.80 shows that g € null(T — wI). Thus g is anonzero 
element of both null(T — J) and its orthogonal complement. This contradiction 
completes the proof of the claim in 10.78. 

To show that range(T — aI) is closed, suppose hy,h2,... is a sequence in 
range(T —«lI) that converges to some h € V. For each n € Z*, there exists 


fn € (null(T - al))~ such that (T — aI) fy = hy. Because hy, h2,... is a Cauchy 
sequence, 10.78 shows that f;, f2,... is also a Cauchy sequence. Thus there exists 
f € V such that limy-5o0 fn = f, which implies h = (T — al) f € range(T — al). 
Hence range(T — «1) is closed. 


Suppose T is a compact operator on a Hilbert space V and f € V anda € F \ {0}. 
An immediate consequence (often useful when investigating integral equations) of 
the result above and 10.13(d) is that the equation 


Tg —ag =f 
has a solution g € V if and only if (f,) = 0 for every h € V such that T*h = ah. 
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10.81 Definition geometric multiplicity 


e The geometric multiplicity of an eigenvalue « of an operator T is defined to 
be the dimension of null(T — «). 


e In other words, the geometric multiplicity of an eigenvalue a of T is the 
dimension of the subspace consisting of 0 and all the eigenvectors of T 
corresponding to w. 


There exist compact operators for which the eigenvalue 0 has infinite geometric 
multiplicity. The next result shows that this cannot happen for nonzero eigenvalues. 


10.82 nonzero eigenvalues of compact operators have finite multiplicity 


Suppose T is a compact operator on a Hilbert space and a € F with a # 0. Then 
null(T — w1) is finite-dimensional. 


Proof Suppose f € null(T — «!). Then f = T(L). Hence f € range T. 
Thus we have shown that null(T — aI) C range T. Because T is continuous, 
null(T — w1) is closed. Thus 10.74 implies that null(T — wT) is finite-dimensional. 


The next lemma is used in our proof of the Fredholm Alternative (10.85). Note 
that this lemma implies that every injective operator on a finite-dimensional vector 
space is surjective (because a finite-dimensional vector space cannot have an infinite 
chain of strictly decreasing subspaces—the dimension decreases by at least 1 in each 
step). Also, see Exercise 10 for the analogous result implying that every surjective 
operator on a finite-dimensional vector space is injective. 


10.83 injective but not surjective 


If T is an injective but not surjective operator on a vector space, then 


range T 2 range We 2 range T° Q:::. 


Proof Suppose T is an injective but not surjective operator on a vector space V. 
Suppose n € Z*. If g € V, then 


Ts = T" (Tg) € rangeT”. 


Thus range T” > range T”*1. 
To show that the last inclusion is not an equality, note that because T is not 
surjective, there exists f € V such that 


10.84 f ¢ rangeT. 


Now T"f © rangeT”. However, T"f ¢ rangeT”*! because if g € V and 
T"f =T"*1g, then T"f = T"(Tg), which would imply that f = Tg (because T” 
is injective), which would contradict 10.84. Thus range T” = range qatl 
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Compact operators behave, in some respects, like operators on a finite-dimensional 
vector space. For example, the following important theorem should be familiar to you 
in the finite-dimensional context (where the choice of « = 0 need not be excluded). 


10.85 Fredholm Alternative 


Suppose T is a compact operator on a Hilbert space and a € F with a # 0. Then 
the following are equivalent: 


(a) « € sp(T). 


(b) «a is an eigenvalue of T. 


(c) T — al is not surjective. 


Proof Clearly (b) implies (a) and (c) implies (a). 

To prove that (a) implies (b), suppose « € sp(T) but « is not an eigenvalue of T. 
Thus T — a is injective but T — aI is not surjective. Thus 10.83 applied to T — aI 
shows that 


10.86 range(T — aI) 2 range(T — al)? 2 range(T — aI)? a Ah 4 
If n € Z*, then the Binomial Theorem and 10.69 show that 
(T—al)" =S+4+(-«a)"I 


for some compact operator S. Now 10.77 shows that range(T — a1)” is a closed 
subspace of the Hilbert space on which T operates. Thus 10.86 implies that for each 
n € Zt, there exists 


10.87 fn € range(T — a1)" N (range(T — a[)ttt)+ 


such that || fn || = 1. 
Now suppose j,k € Z* with j < k. Then 


10.88 Tf; — Th = (T — al) fj — (T—al) fe —afe+ afi. 


Because f; and fj are both in range(T — al )/, the first two terms on the right side of 
10.88 are in range(T — a1)/+1. Because j + 1 < k, the third term in 10.88 is also in 
range(T — gs, Now 10.87 implies that the last term in 10.88 is orthogonal to 
the sum of the first three terms. Thus 10.88 leads to the inequality 


TF5— Thell = lla fill = lal. 


The inequality above implies that Tf, T fo, ... has no convergent subsequence, which 
contradicts the compactness of T. This contradiction means the assumption that « is 
not an eigenvalue of T was false, completing the proof that (a) implies (b). 

At this stage, we know that (a) and (b) are equivalent and that (c) implies (a). To 
prove that (a) implies (c), suppose « € sp(T). Thus @ € sp(T*). Applying the 
equivalence of (a) and (b) to T*, we conclude that & is an eigenvalue of T*. Thus 
applying 10.13(d) to T — al shows that T — a! is not surjective, completing the proof 
that (a) implies (c). 
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The previous result traditionally has the word alternative in its name because it 
can be rephrased as follows: 


If T is a compact operator on a Hilbert space V and a € F \ {0}, then 
exactly one of the following holds: 


1. the equation Tf = af has a nonzero solution f € V; 


2. the equation g = Tf — wf has a solution f € V for every g € V. 


The next example shows the power of the Fredholm Alternative. In this example, 
we want to show that V — al is invertible for all a € F \ {0}. The verification 
that V — aI is injective is straightforward. Showing that V — wI is surjective would 
require more work. However, the Fredholm Alternative tells us, with no further work, 
that V — aI is invertible. 


10.89 Example spectrum of the Volterra operator 


We want to show that the spectrum of the Volterra operator V is {0} (see Example 
10.15 for the definition of VY). The Volterra operator V is compact (see the comment 
after the proof of 10.70). Thus 0 € sp(V), by 10.76. 

Suppose a € F \ {0}. To show that « ¢ sp(V), we need only show that a is not 
an eigenvalue of V (by 10.85). Thus suppose f € L*([0,1]) and Vf = wf. Hence 


10.90 [ f= 2Fe) 


for almost every x € [0,1]. The left side of 10.90 is a continuous function of x and 
thus so is the right side, which implies that f is continuous. The continuity of f 
now implies that the left side of 10.90 has a continuous derivative, and thus f has a 
continuous derivative. 

Now differentiate both sides of 10.90 with respect to x, getting 


f(x) = af'(x) 
for all x € (0,1). Standard calculus shows that the equation above implies that 
f(x) = cer" 


for some constant c. However, 10.90 implies that the continuous function f must 
satisfy the equation f(0) = 0. Thus c = 0, which implies f = 0. 

The conclusion of the last paragraph shows that « is not an eigenvalue of V. The 
Fredholm Alternative (10.85) now shows that « ¢ sp(V). Thus sp(V) = {0}. 


If a is an eigenvalue of an operator T on a finite-dimensional Hilbert space, 
then & is an eigenvalue of T*. This result does not hold for bounded operators on 
infinite-dimensional Hilbert spaces. 

However, suppose T is a compact operator on a Hilbert space and « is a nonzero 
eigenvalue of T. Thus a € sp(T), which implies that X € sp(T*) (because a 
bounded operator is invertible if and only if its adjoint is invertible). The Fredholm 
Alternative (10.85) now shows that @ is an eigenvalue of T*. Thus compactness 
allows us to recover the finite-dimensional result (except for the case a = 0). 
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Our next result states that if T is a compact operator and a # 0, then null(T — a) 
and null(T* — #1) have the same dimension (denoted dim). This result about 
the dimensions of spaces of eigenvectors is easier to prove in finite dimensions. 
Specifically, suppose S is an operator on a finite-dimensional Hilbert space V (you 
can think of S = T — wl). Then 


dimnull S$ = dim V — dimrange S = dim(range $)+ = dim null S*, 


where the justification for each step should be familiar to you from finite-dimensional 
linear algebra. This finite-dimensional proof does not work in infinite dimensions 
because the expression dim V — dimrange S could be of the form 00 — ov. 
Although the dimensions of the two null spaces in the result below are the same, 
even in finite dimensions the two null spaces are not necessarily equal to each other 
(but we do have equality of the two null spaces when T is normal; see 10.56). 
Note that both dimensions in the result below are finite (by 10.82 and 10.73). 


10.91 null spaces of T — xl and T* — wI have same dimensions 


Suppose T is a compact operator on a Hilbert space and a € F with a ~ 0. Then 


dim null(T — aI) = dimnull(T* — #1). 


Proof Suppose dimnull(T — aI) < dimnull(T* — #1). Because null(T* — a1) 
equals (range(T —al )) a there is a bounded injective linear map 


R: null(T — #1) + (range(T — al))~ 
that is not surjective. Let V denote the Hilbert space on which T operates, and let P 
be the orthogonal projection of V onto null(T — wI). Define a linear map S: V > V 
by 
S=T+RP. 
Because RP is a bounded operator with finite-dimensional range, S is compact. Also, 
S—al = (T-—al)+RP. 


Every element of range(T — aI) is orthogonal to every element of range RP. 
Suppose f € V and (S — w1)f = 0. The equation above shows that (T — aI) f = 0 
and RPf = 0. Because f € null(T — af), we see that Pf = f, which then implies 
that Rf = RPf = 0, which then implies that f = 0 (because R is injective). Hence 
S — al is injective. 

However, because R maps onto a proper subset of (range(T —al yy, we see that 
S — aI is not surjective, which contradicts the equivalence of (b) and (c) in 10.85. This 
contradiction means the assumption that dim null(T — aI) < dimnull(T* — a1) 
was false. Hence we have proved that 


10.92 dimnull(T — aI) > dimnull(T* — #1) 


for every compact operator T and every a € F \ {0}. 
Now apply the conclusion of the previous paragraph to T* (which is compact by 
10.73) and @, getting 10.92 with the inequality reversed, completing the proof. 


322 Chapter 10 Linear Maps on Hilbert Spaces 


The spectrum of an operator on a finite-dimensional Hilbert space is a finite set, 
consisting just of the eigenvalues of the operator. The spectrum of a compact operator 
on an infinite-dimensional Hilbert space can be an infinite set. However, our next 
result implies that if a compact operator has infinite spectrum, then that spectrum 
consists of 0 and a sequence in F with limit 0. 


10.93 spectrum of a compact operator 


Suppose T is a compact operator on a Hilbert space. Then 


{a € sp(T) : |a| > 3} 


is a finite set for every 6 > 0. 


Proof Fix 6 > 0. Suppose there exist distinct #1, 2,...in sp(T) with |an| > 6 
for every n € Z*. The Fredholm Alternative (10.85) implies that each a, is an 
eigenvalue of T. For n € Z*, let 


Un = null((T — a41)---(T —anl)). 


and let Up = {0}. Because T is continuous, each U; is a closed subspace of the 
Hilbert space on which T operates. Furthermore, U,_; C U, for each n € Zt 
because operators of the form T — aI and T — a1 commute with each other. 

Ifn € Z* and g is an eigenvector of T corresponding to the eigenvalue a, then 
g € U;, but g ¢ U,,_ because 


(T — ay41) +++ (T = ay_1Dg = (On — 81) +++ (On — Oy-1)@ #0. 


In other words, we have 
ep Sie, 


Thus for each n € Z*, there exists 
10.94 en € UnN (Gea) 


such that |/e,,|| = 1. 
Now suppose j,k € Z* with j < k. Then 


10.95 Te; _ Ter = (T a1 )e; (T alex t Mj] — XKeK. 


Because j < k — 1, the first three terms on the right side of 10.95 are in U,_1. Now 
10.94 implies that the last term in 10.95 is orthogonal to the sum of the first three 
terms. Thus 10.95 leads to the inequality 


|| Te; — Tex|] = |laxexl| = lax] = 6. 


The inequality above implies that Te;, Tez,... has no convergent subsequence, which 
contradicts the compactness of T. This contradiction means that the assumption that 
sp(T) contains infinitely many elements with absolute value at least 5 was false. 
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EXERCISES 10C 


10 


11 


Prove that if T is a compact operator on a Hilbert space V and e1,e2,... is an 
orthonormal sequence in V, then limy 500 Tey = 0. 


Prove that if T is a compact operator on L*((0,1]), then lim Vn\||T(x")|l2 = 0, 
n—-oo 
where x” means the element of L*([0,1]) defined by x +> x". 


Suppose T is a compact operator on a Hilbert space V and fy, fo,... is a 
sequence in V such that limy—+oo( fn, g) = 0 for every g € V. Prove that 
limy—+oo|| T fn|| = 0. 


Suppose h € L®(R). Define M,, € B(L?(R)) by Mf = fh. Prove that if 
|| ||0 > 0, then Mj, is not compact. 


Suppose (bi, bz,...) € €°. Define T: > (7 by 
T (a4, 42,...) = (a,b 1,a2b2,...). 
Prove that T is compact if and only if lim b, =0. 
n—-oo 


Suppose T is a bounded operator on a Hilbert space V. Prove that if there exists 
an orthonormal basis {e,}xer of V such that 


Yili Tell? < &, 
keT 


then T is compact. 


Suppose T is a bounded operator on a Hilbert space V. Prove that if {e,}per 
and { fi } je are orthonormal bases of V, then 


2 2 
Lilltexil* = ITAL 
ke jEO 


Suppose T is a bounded operator on a Hilbert space. Prove that T is compact if 
and only if T*T is compact. 


Prove that if T is a compact operator on an infinite-dimensional Hilbert space, 
then ||J — T|| > 1. 


Show that if T is a surjective but not injective operator on a vector space V, then 


null T & null T? € null T? . vey 


Suppose T is a compact operator on a Hilbert space and a € F \ {0}. 
(a) Prove that range(T — aI)'"~! = range(T — a1)” for some m € Zt. 
(b) Prove that null(T — «I)"~! = null(T — a1)" for some n € Z*. 


(c) Show that the smallest positive integer m that works in (a) equals the 
smallest positive integer 1 that works in (b). 
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12 


13 


14 


15 


16 


17 
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Prove that if f: [0, 
ous function g: [0, 


] — Fis a continuous function, then there exists a continu- 
] > F such that 


fa)=se)+ [os 


1 
1 


for all x € [0,1]. 


Suppose S is a bounded invertible operator on a Hilbert space V and T is a 
compact operator on V. 


(a) Prove that S + T has closed range. 


(b) Prove that S + T is injective if and only if S + T is surjective. 
(c) Prove that null(S + T) and null(S* + T*) are finite-dimensional. 
(d) Prove that dim null(S + T) = dimnull(S* + T*). 


(e) Prove that there exists R € B(V) such that range R is finite-dimensional 
and S + T + R is invertible. 


Suppose T is a compact operator on a Hilbert space V. Prove that range T is a 


separable subspace of V. 


Suppose T is a compact operator on a Hilbert space V and e1,é2,... is an 
orthonormal basis of range T. Let P,, denote the orthogonal projection of V 
onto span{ey,...,en}. 


(a) Prove that lim ||T — P,T|| = 0. 
n—-oo 


(b) Prove that an operator on a Hilbert space V is compact if and only if it is the 
limit in B(V) of a sequence of bounded operators with finite-dimensional 
range. 


Prove that if T is a compact operator on a Hilbert space V, then there exists a 
sequence S,,S9,... of invertible operators on V such that lim ||T — S,|| = 0. 
n—- oo 


Suppose T is a bounded operator on a Hilbert space such that p(T) is compact 
for some nonzero polynomial p with coefficients in F. Prove that sp(T) is a 
countable set. 


Suppose T is a bounded operator on a Hilbert space. The algebraic multiplicity 
of an eigenvalue « of T is defined to be the dimension of the subspace 


|) null(T — aI)”. 


n=1 


As an easy example, if T is the left shift as defined in the next exercise, then the 
eigenvalue 0 of T has geometric multiplicity 1 but algebraic multiplicity oo. 


The definition above of algebraic multiplicity is equivalent on finite-dimensional 
spaces to the common definition involving the multiplicity of a root of the charac- 
teristic polynomial. However, the definition used here is cleaner (no determinants 
needed) and has the advantage of working on infinite-dimensional Hilbert spaces. 


18 


19 


20 


21 


22 
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Suppose T € B(@?) is defined by T(a1,@2,43,...) = (a2,3,44,...). Suppose 
also that w € Fand |a| < 1. 
(a) Show that the geometric multiplicity of a as an eigenvalue of T equals 1. 


(b) Show that the algebraic multiplicity of « as an eigenvalue of T equals ov. 


Prove that the geometric multiplicity of an eigenvalue of a normal operator on a 
Hilbert space equals the algebraic multiplicity of that eigenvalue. 


Prove that every nonzero eigenvalue of a compact operator on a Hilbert space 
has finite algebraic multiplicity. 


Prove that if T is a compact operator on a Hilbert space and « is a nonzero 
eigenvalue of T, then the algebraic multiplicity of a as an eigenvalue of T equals 
the algebraic multiplicity of @ as an eigenvalue of T*. 


Prove that if V is a separable Hilbert space, then the Banach space C(V) is 
separable. 
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10D Spectral Theorem for Compact 
Operators 


Orthonormal Bases Consisting of Eigenvectors 


We begin this section with the following useful lemma. 


10.96 T*T —||T||71 is not invertible 


If T is a bounded operator on a nonzero Hilbert space, then ||T||* € sp(T*T). 


Proof Suppose T is a bounded operator on a nonzero Hilbert space V. Let f1, fo,... 
be a sequence in V such that || f,,|| = 1 foreach n € Z* and 


10.97 lim ||T fn|| = ||). 


n—- oo 


Then 
ps 
||T*T fn — TIP fn)” = WT*T Fall? — 2TIP(T*T fn, fa) + (ITI 


= ||T°T fall? — ZIT IF IT fall? + ITI 


2 


> 


10.98 < 2||T\I* — 2||T ||" || Tf 


where the last line holds because ||T*Tfn|| < ||T*|| ||T ful] < ||T||*. Now 10.97 
and 10.98 imply that 
lim (T*T — [TIP fn =0. 


Because || f;,|| = 1 for each n € Z+, the equation above implies that T*T — ||T||71 
is not invertible, as desired. 


The next result indicates one way in which self-adjoint compact operators behave 
like self-adjoint operators on finite-dimensional Hilbert spaces. 


10.99 every self-adjoint compact operator has an eigenvalue. 


Suppose T is a self-adjoint compact operator on a nonzero Hilbert space. Then 
either ||T|| or —||T|| is an eigenvalue of T. 


Proof Because T is self-adjoint, 10.96 states that T? — ||T||*/ is not invertible. Now 
1? — ||r\?2 = (TIT) (T+ ITM). 


Thus T — ||T||I and T + ||T||I cannot both be invertible. Hence ||T|| € sp(T) or 
—||T|| € sp(T). Because T is compact, 10.85 now implies that ||T'|] or —||T|] is an 
eigenvalue of T, as desired, or that ||T || = 0, which means that T = 0, in which case 
0 is an eigenvalue of T. 
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If T is an operator on a vector space V and U is a subspace of V, then T|y is a 
linear map from U to V. For T|,y to be an operator (meaning that it is a linear map 
from a vector space to itself), we need T(U) C U. Thus we are led to the following 
definition. 


10.100 Definition invariant subspace 


Suppose T is an operator on a vector space V. A subspace U of V is called an 
invariant subspace for T if Tf € U for every f € U. 


10.101 Example invariant subspaces 
You should verify each of the assertions below. 


e For b € [0,1], the subspace 
{f € L?([0,1]) : f(t) =0 for almost every t € [0,b]} 


is an invariant aa for the Volterra operator V: L*({0,1]) + L?((0,1]) 
defined by (Vf)(x) = Jp f. 


e Suppose T is an operator on a Hilbert space V and f € V with f #0. Then 
span{ f} is an invariant subspace for T if and only if f is an eigenvector of T. 


e Suppose T is an operator on a Hilbert space V. Then {0}, V, null T, and range T 
are invariant subspaces for T. 


e If T is a bounded operator on a Hilbert space and U is an invariant subspace for 
T, then U is an invariant subspace for T. 


If T is a compact operator on a Hilbert 
space and U is an invariant subspace for 
T, then T| y is a compact operator on U, 
as follows from the definitions. 

If U is an invariant subspace for a 
self-adjoint operator T, then T|,; is self- 
adjoint because 


The most important open question in 


operator theory is the invariant 
subspace problem, which asks 
whether every bounded operator on 
a Hilbert space with dimension 
greater than 1 has a closed invariant 
subspace other than {0} and V. 


(Tlu)f,8) = (Tf8) = FT) = Ff, Tug 


for all f, g € U. The next result shows that a bit more is true. 


10.102 U invariant for self-adjoint T implies U+ invariant for T 


Suppose U is an invariant subspace for a self-adjoint operator T. Then 


(a) U- is also an invariant subspace for T; 


(b) T|,; is a self-adjoint operator on UES 
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Proof To prove (a), suppose f € U-. If g €U, then 


(Tf,g) = (f, Tg) =0, 
where the first equality holds because T is self-adjoint and the second equality holds 
because Tg € U and f € U-. Because the equation above holds for all g € U, we 
conclude that Tf € U+. Thus U+ is an invariant subspace for T, proving (a). 
By part (a), we can think of T|,;, as an operator on U-+. To prove (b), suppose 
he U4 If f € U4 then 


(f, (Tlus)"h) = (Tlf, h) = (Tf) = Cf, Th) = (f, Tht). 


Because (T|,).)*h and T|,,./ are both in U+ and the equation above holds for all 
f € U4, we conclude that (T|,,1)*h = T| ih, proving (b). 


Operators for which there exists an orthonormal basis consisting of eigenvectors 
may be the easiest operators to understand. The next result states that any such 
operator must be self-adjoint in the case of a real Hilbert space and normal in the 
case of a complex Hilbert space. 


10.103 orthonormal basis of eigenvectors implies self-adjoint or normal 


Suppose T is a bounded operator on a Hilbert space V and there is an orthonormal 
basis of V consisting of eigenvectors of T. 


(a) If F=R, then T is self-adjoint. 
(b) If F = C, then T is normal. 


Proof Suppose {e jt jer is an orthonormal basis of V such that e; is an eigenvector 
of T for each j € I’. Thus there exists a family {a;}jer in F such that 


10.104 Te; = a je; 
foreach j € T. Ifk € T and f € V, then 


(f, Ten) = (T fees) = (T(L (fei) ex) 
je 
= (Leaiifrejdeje) = ag(f,ex) = (f,Kek)- 
je 


The equation above implies that 
10.105 Te, = Wer. 
To prove (a), suppose F = R. Then 10.105 and 10.104 imply T*e, = ape, = Tey 
for each k € K. Hence T* = T, completing the proof of (a). 
To prove (b), now suppose F = C. If k € T, then 10.105 and 10.104 imply that 
(T*T) (ex) = T* (apex) = |axl?ex = T(@eer) = (TT*) (ex). 


Because the equation above holds for all k € T, we conclude that T*T = TT* Thus 
T is normal, completing the proof of (b). 
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The next result is one of the major highlights of the theory of compact operators 
on Hilbert spaces. The result as stated below applies to both real and complex Hilbert 
spaces. In the case of a real Hilbert space, the result below can be combined with 
10.103(a) to produce the following result: A compact operator on a real Hilbert 
space is self-adjoint if and only if there is an orthonormal basis of the Hilbert space 
consisting of eigenvectors of the operator. 


10.106 Spectral Theorem for self-adjoint compact operators 
Suppose T is a self-adjoint compact operator on a Hilbert space V. Then 


(a) there is an orthonormal basis of V consisting of eigenvectors of T; 


(b) there is a countable set Q, an orthonormal family {e; },eq in V, and a family 


{ax }keq in R \ {0} such that 


Dia, eee 


kEO 


for every f € V. 


Proof Let U denote the span of all the eigenvectors of T. Then U is an invariant 
subspace for T. Hence U*+ is also an invariant subspace for T and T| 1 is a self- 
adjoint operator on U+ (by 10.102). However, T|,;. has no eigenvalues, because 
all the eigenvectors of T are in U. Because all self-adjoint compact operators on a 
nonzero Hilbert space have an eigenvalue (by 10.99), this implies that Ut = {0}. 
Hence U = V (by 8.42). 

For each eigenvalue « of T, there is an orthonormal basis of null(T — «1) consist- 
ing of eigenvectors corresponding to the eigenvalue a. The union (over all eigenvalues 
« of T) of all these orthonormal bases is an orthonormal family in V because eigen- 
vectors corresponding to distinct eigenvalues are orthogonal (see 10.57). The previous 
paragraph tells us that the closure of the span of this orthonormal family is V (here 
we are using the set itself as the index set). Hence we have an orthonormal basis of 
V consisting of eigenvectors of T, completing the proof of (a). 

By part (a) of this result, there is an orthonormal basis {e,},cr of V and a family 
{ax}rer in R such that Te, = a,e, for each k € T (even if F = C, the eigenvalues 
of T are in R by 10.49) . Thus if f € V, then 


Tf = T(Li(fer)er) = re) Tee = 5 we fee 
keT keT keT 


Letting O = {k €T : a, 4 0}, we can rewrite the equation above as 


Th = }) wl frerer 


keO 


for every f € V. The set O is countable because T has only countably many 
eigenvalues (by 10.93) and each nonzero eigenvalue can appear only finitely many 
times in the sum above (by 10.82), completing the proof of (b). 
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A normal compact operator on a nonzero real Hilbert space might have no eigen- 
values [consider, for example the normal operator T of counterclockwise rotation by 
aright angle on R? defined by T(x, y) = (—y,x)]. However, the next result shows 
that normal compact operators on complex Hilbert spaces behave better. The key idea 
in proving this result is that on a complex Hilbert space, the real and imaginary parts 
of a normal compact operator are commuting self-adjoint compact operators, which 
then allows us to apply the Spectral Theorem for self-adjoint compact operators. 


10.107 Spectral Theorem for normal compact operators 


Suppose T is a compact operator on a complex Hilbert space V. Then there is an 
orthonormal basis of V consisting of eigenvectors of T if and only if T is normal. 


Proof One direction of this result has already been proved as part (b) of 10.103. 
To prove the other direction, suppose T is a normal compact operator. We can 
write 
T=A+1B, 


where A and B are self-adjoint operators and, because T is normal, AB = BA (see 
10.54). Because A = (T + T*)/2 and B = (T — T*)/(2i), the operators A and B 
are both compact. 

If « € Rand f € null(A —a/J), then 


(A —al)(Bf) = A(Bf) — «Bf = B(Af) — aBf = B((A—al)f) = B(0) =0 


and thus Bf € null(A — aI). Hence null(A — aI) is an invariant subspace for B. 

Applying the Spectral Theorem for self-adjoint compact operators [10.106(a)] to 
B | null( A—al) Shows that for each eigenvalue « of A, there is an orthonormal basis of 
null(A — wI) consisting of eigenvectors of B. The union (over all eigenvalues « of 
A) of all these orthonormal bases is an orthonormal family in V (use the set itself as 
the index set) because eigenvectors of A corresponding to distinct eigenvalues of A 
are orthogonal (see 10.57). The Spectral Theorem for self-adjoint compact operators 
[10.106(a)] as applied to A tells us that the closure of the span of this orthonormal 
family is V. Hence we have an orthonormal basis of V, each of whose elements is an 
eigenvector of A and an eigenvector of B. 

If f € V is an eigenvector of both A and B, then there exist «,6 € R such 
that Af = of and Bf = Bf. Thus Tf = (A +iB)(f) = (a+ Bi)f; hence f is 
an eigenvector of T. Thus the orthonormal basis of V constructed in the previous 
paragraph is an orthonormal basis consisting of eigenvectors of T, completing the 
proof. 


The following example shows the power of the Spectral Theorem for normal 
compact operators. Finding the eigenvalues and eigenvectors of the normal compact 
operator V — Y* in the next example leads us to an orthonormal basis of L*({0, 1]). 
Easy calculus shows that the family {e,},cz, where e; is defined as in 10.112, is 
an orthonormal family in L*([0,1]). The hard part of showing that {e,},¢z is an 
orthonormal basis of L?((0,1]) is to show that the closure of the span of this family 
is L?((0,1]). However, the Spectral Theorem for normal compact operators (10.107) 
provides this information with no further work required. 
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10.108 Example an orthonormal basis of eigenvectors 
Suppose V: L*([0,1]) + L?({0,1]) is the Volterra operator defined by 


Ax) = fF. 


The operator V is compact (see the paragraph after the proof of 10.70), but it is not 
normal. Because V is compact, so is V* (by 10.73). Hence VY — Y* is compact. Also, 
(V-—V*)* =V*-—V = —(V — Y*). Because every operator commutes with its 
negative, we conclude that V — Y* is a compact normal operator. Because we want 
to apply the Spectral Theorem, for the rest of this example we will take F = C. 

If f € L((0,1]) and x € [0,1], then the formula for V* given by 10.16 shows 
that 


10.109 (v-vyAe)=2f'¢- fF. 


The right side of the equation above is a continuous function of x whose value at 
x = Ois the negative of its value at x = 1. 

Differentiating both sides of the equation above and using the Lebesgue Differen- 
tiation Theorem (4.19) shows that 


(Y= V*)F) (2) = 2F(%) 

for almost every x € [0,1]. If f € null(V — Y*), then differentiating both sides 
of the equation (V — V*) f = 0 shows that 2f(x) = 0 for almost every x € [0,1]; 
hence f = 0, and we conclude that Y — Y* is injective (so 0 is not an eigenvalue). 

Suppose f is an eigenvector of V — Y* with eigenvalue w. Thus f is in the range 
of V — V*, which by 10.109 implies that f is continuous on [0,1], which by 10.109 
again implies that f is continuously differentiable on (0,1). Differentiating both 
sides of the equation (V — V*) f = wf gives 


2f (x) = af" (x) 


for all x € (0,1). Hence the function whose value at x is e~ (*/“)* f (x) has derivative 
0 everywhere on (0,1) and thus is a constant function. In other words, 


10.110 f (x) = cel2/#)* 


for some constant c # 0. Because f € range(V — V*), we have f(0) = —f(1), 
which with the equation above implies that there exists k € Z such that 


10.111 2/a=i(2k+1)x. 


Replacing 2/« in 10.110 with the value of 2/« derived in 10.111 shows that for 
k € Z, we should define e, € L([0,1]) by 


10.112 eg (x) = ef @kt dr, 


Clearly {ex },ez is an orthonormal family in L?({0,1]) [the orthogonality can be 
verified by a straightforward calculation or by using 10.57]. The paragraph above 
and the Spectral Theorem for compact normal operators (10.107) imply that this 
orthonormal family is an orthonormal basis of L*({0, 1}). 
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Singular Value Decomposition 


The next result provides an important generalization of 10.106(b) to arbitrary compact 
operators that need not be self-adjoint or normal. This generalization requires two 
orthonormal families, as compared to the single orthonormal family in 10.106(b). 


10.113 Singular Value Decomposition 


Suppose T is a compact operator on a Hilbert space V. Then there exist a 
countable set Q, orthonormal families {e,},eq and {hx },eq in V, and a family 
{sx }keq of positive numbers such that 


10.114 Tp Ye se cue 


kEO 


for every f € V. 


Proof If # is an eigenvalue of T*T, then (T*T) f = af for some f A 0 and 


afl? = (af, f) = (TTS £) = (TF TF) = ITF I. 


Thus a > 0. Hence all eigenvalues of T*T are nonnegative. 

Apply 10.106(b) and the conclusion of the paragraph above to the self-adjoint 
compact operator T*T, getting a countable set O, an orthonormal family {e,},eq in 
V, and a family {s;};eq of positive numbers (take s, = \/a, ) such that 


10.115 (PT) f= Yar fee: 
keO 
for every f € V. The equation above implies that (T*T)e; = oe for each 7 € Q. 
For k € Q), let 
T 
= aoe 
Sk 


For j,k € O, we have 


— 1 os = 
(hj, hk) Te.) = ad Tej,ek) = PA 


The equation above implies that {h,},¢q is an orthonormal family in V. 
If f € span {ex }peq, then 


Tf= T(Y (fender) = Fe ee =), eee he 


keO, keO, keO 
showing that 10.114 holds for such f. 
If f € (span {ex}een) > then 10.115 shows that (T*T)f = 0, which implies 
that Tf = 0 (because 0 = (T*Tf, f) = ||Tf||2); thus both sides of 10.114 are 0. 
Hence the two sides of 10.114 agree for f in a closed subspace of V and for f in 


the orthogonal complement of that closed subspace, which by linearity implies that 
the two sides of 10.114 agree for all f € V. 
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An expression of the form 10.114 is called a singular value decomposition of 
the compact operator T. The orthonormal families {e,},¢q and {hx}zeq in the 
singular value decomposition are not uniquely determined by T. However, the 
positive numbers {s; };cq are uniquely determined as positive square roots of positive 
eigenvalues of T*T. These positive numbers can be placed in decreasing order 
(because if there are infinitely many of them, then they form a sequence with limit 0, 
by 10.93). This procedure leads to the definition of singular values given below. 

Suppose T is a compact operator. Recall that the geometric multiplicity of a 
positive eigenvalue a of T*T is defined to be dimnull(T*T — wI) [see 10.81]. This 
geometric multiplicity is the number of times that \/a appears in the family {s,},eq 
corresponding to a singular value decomposition of T. By 10.82, this geometric 
multiplicity is finite. 

Now we can define the singular values of a compact operator T, where we are 
careful to repeat the square root of each positive eigenvalue of T*T as many times as 
its geometric multiplicity. 


10.116 Definition singular values 


e Suppose T is a compact operator on a Hilbert space. The singular values 
of T, denoted s1(T) > s2(T) > s3(T) > ---, are the positive square roots 
of the positive eigenvalues of T*T, arranged in decreasing order with each 
singular value s repeated as many times as the geometric multiplicity of s* 
as an eigenvalue of T*T. 


e If T*T has only finitely many positive eigenvalues, then define s,(T) = 0 
for alln € Z* for which s;,(T) is not defined by the first bullet point. 


10.117 Example — singular values on a finite-dimensional Hilbert space 
Define T: F* — F* by 
T (21,22, 23,24) = (0,321,222, —32Z4). 
A calculation shows that 
(T*T) (21,22, 23, Z4) = (921,422, 0, 924). 
Thus the eigenvalues of T*T are 9, 4,0 and 
dim(T*T—9I)=2 and dim(T*T—4I) =1. 


Taking square roots of the positive eigenvalues of T*T and then adjoining an infinite 
string of 0’s shows that the singular values of Tare3 >3 >2>0>0°2::-. 

Note that —3 and 0 are the only eigenvalues of T: Thus in this case, the list of 
eigenvalues of T did not pick up the number 2 that appears in the definition (and 
hence the behavior) of T, but the list of singular values of T does include 2. 


If T is a compact operator, then the first singular value s;(T) equals ||T 
are asked to verify in Exercise 12. 


|, as you 
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10.118 Example singular values of V — V* 


Let V denote the Volterra operator and let T = V — V*. In Example 10.108, we 
saw that if e, is defined by 10.112 then {e,},¢z is an orthonormal basis of L7((0, 1}) 


and 
2 


Te. = ——— 
“k ~ 7Ok+ 1a * 
for each k € Z, where the eigenvalue shown above corresponding to e, comes from 
10.111. Now 10.56 implies that 


—2 
Te = 
“k ~ Fk 1) 
for each k € Z. Hence 
10.119 T*Te, = — 
(k++ 1272 * 


for each k € Z. After taking positive square roots of the eigenvalues, we see that the 
equation above shows that the singular values of T are 


2 2 2 2 2 2: 
> SS > > > we 
1 mam 32m 32 52 ~ 57 


where the first two singular values above come from taking k = —1 and k = 0 in 
10.119, the next two singular values above come from taking k = —2 and k = 1, 
the next two singular values above come from taking k = —3 and k = 2, and so on. 


Each singular value of T appears twice in the list of singular values above because 
each eigenvalue of T*T has geometric multiplicity 2. 


For n € Z*, the singular value s,(T) of a compact operator T tells us how well 
we can approximate T by operators whose range has dimension less than 1 (see 
Exercise 15). 

The next result makes an important connection between K € L?(j x 1) and the 
singular values of the integral operator associated with K. 


10.120 sum of squares of singular values of integral operator 


Suppose ji is a -finite measure and K € L?(p x ys). Then 


foe} 


Proof Consider a singular value decomposition 


10.121 T(f) = \o se(f,ex)h 


kEO 


of the compact operator Zx. Extend {e;}jeq to an orthonormal basis {e;}jer of 
L?(), and extend {hy },eq to an orthonormal basis {hy} ,er of L*(p). 
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Let X denote the set on which the measure p lives. For j € T andk € I’, define 
Sik: X x X — Fby 


Sin(%-Y) = ej(y)hx(x)- 
Then { sak jer,ker’ is an orthonormal basis of L?(p x 4), as you should verify. Thus 


jet ker! 


| 2 


J [ Kener) ay) a(x) 


Get ve 


10.122 = 


10.123 =) 3 


where 10.122 holds because 10.121 shows that Zxe; = sjh; for j € O. and xe; = 0 
for j € T \ O; 10.123 holds because {h;},cr is an orthonormal family. 


Now we can give a spectacular application of the previous result. 


1 
10.124 Example 2 32 52 ro 


Define K: [0,1] x [0,1] + R by 


1 ifx > y, 
Kixw=<0 ifx=y, 
—1 ifx<y. 


Letting ji be Lebesgue measure on [0,1], we note that Zx is the normal compact 
operator V — V* examined in Example 10.118. 

Clearly Kl 22 Gx pn) = 1. Using the list of singular values for Zx obtained in 
Example 10.118, the formula in 10.120 tells us that 


4 
1=2 
dL ee ip 
Thus 
i,1,41,....8 
12" 32 | 52 | 8 
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EXERCISES 10D 


10 


Prove that if T is a compact operator on a nonzero Hilbert space, then ||T'||? is 
an eigenvalue of T*T. 


Prove that if T is a self-adjoint operator on a nonzero Hilbert space V, then 
IT || = sup{(Tf, f)| = f € V and |] f|| = 1}. 


Suppose T is a bounded operator on a Hilbert space V and U is aclosed subspace 
of V. Prove that the following are equivalent: 


(a) U is an invariant subspace for T. 
(b) U- is an invariant subspace for T*. 
(c) TPu => PuyTPy. 


Suppose T is a bounded operator on a Hilbert space V and U is aclosed subspace 
of V. Prove that the following are equivalent: 


(a) U and U~ are invariant subspaces for T. 


(b) U and U~ are invariant subspaces for T*. 
(c) TPu = PuT. 


Suppose T is a bounded operator on a nonseparable normed vector space V. 
Prove that T has a closed invariant subspace other than {0} and V. 


Suppose T is an operator on a Banach space V with dimension greater than 2. 
Prove that T has an invariant subspace other than {0} and V. 

[For this exercise, T is not assumed to be bounded and the invariant subspace is 
not required to be closed. | 


Suppose T is a self-adjoint compact operator on a Hilbert space that has only 
finitely many distinct eigenvalues. Prove that T has finite-dimensional range. 


(a) Prove that if T is a self-adjoint compact operator on a Hilbert space, then 
there exists a self-adjoint compact operator S such that S° = T. 


(b) Prove that if T is a normal compact operator on a complex Hilbert space, 
then there exists a normal compact operator S such that S? = T. 


Suppose T is a compact normal operator on a nonzero Hilbert space V. Prove 
that there is a subspace of V with dimension 1 or 2 that is an invariant subspace 
for T. 

Lf F = C, the desired result follows immediately from the Spectral Theorem for 
compact normal operators. Thus you can assume that F = R.] 


Suppose T is a self-adjoint compact operator on a Hilbert space and ||T|| < i. 
Prove that there exists a self-adjoint compact operator S such that S* + S = T. 


11 


12 
13 


14 


15 


16 


17 
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For k € Z, define g, € L?((—7, 7]) and hy € L?((—7,, 71]) by 


1 it/2_,ikt 1 ikt 
g(t) — erad / e! and hy (£) = Tae 7 


here we are assuming that F = C. 

(a) Use the conclusion of Example 10.108 to show that {¢;},ez is an ortho- 
normal basis of L?((—vt, 71]). 

(b) Use the result in part (a) to show that {i },¢z is an orthonormal basis of 
L*((—7, 7]). 

(c) Use the result in part (b) to show that the orthonormal family in the third 
bullet point of Example 8.51 is an orthonormal basis of L* ((- 71, 7 } 


Suppose T is a compact operator on a Hilbert space. Prove that s;(T) = ||T||. 


Suppose T is a compact operator on a Hilbert space and n € Z*. Prove that 
dim range T < n if and only if s,(T) = 0. 


Suppose T is a compact operator on a Hilbert space V with singular value 


decomposition 
foe) 


Tf = )/ ag(T) (fen) se 


k=1 
for all f € V. Forn € Z*, define T,: V > V by 


igs Ls o4) hy. 


Prove that limy—;00||T — Ty || = 0. 

[This exercise gives another proof, in addition to the proof suggested by Exercise 
15 in Section 10C, that an operator on a Hilbert space is compact if and only if 
it is the limit of bounded operators with finite-dimensional range. | 


Suppose T is a compact operator on a Hilbert space V and n € Z*. Prove that 
inf{||T — S||: S € B(V) and dimrangeS < n} =s,(T). 
Suppose T is a compact operator on a Hilbert space V and n € Z*. Prove that 
Sn(T) = inf{||T|y. || : Wis a subspace of V with dimU < n}. 


Suppose T is a compact operator on a Hilbert space with singular value decom- 


position 
Tf = Yi sx(f,ex)hy 
keO 
for all f € V. Prove that 
T*f = Do se( fbn )er 


keEO 


for all f € V. 
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18 Suppose that T is an operator on a finite-dimensional Hilbert space V with 
dim V = n. 


(a) Prove that T is invertible if and only if s,(T) 4 0. 


(b) Suppose T is invertible and T has a singular value decomposition 


Tf =s1(T)(f,e1)hy + +++ +8n(T)(f,en) hn 


for all f € V. Show that 


(f, Hn) 
S(T) 


Tf = (f,h1) \ 


(7) 1" s,() 


for all f € V. 


19 Suppose T is a compact operator on a Hilbert space V. Prove that 


foe) 


Yl Teell? = Yo (sn(T))? 


keT n=1 


for every orthonormal basis {e; },er of V. 
1 
20 Use the result of Example 10.124 to evaluate +s ae 
n=1 ! 


21 Suppose T is a normal compact operator. Prove that the following are equivalent: 


e range T is finite-dimensional. 
e sp(T) isa finite set. 
e s,(T) = 0 for somen € Z. 


22. Find the singular values of the Volterra operator. 
[Your answer, when combined with Exercise 12, should show that the norm of 
‘2 : i Ss 
the Volterra operator is =. This appearance of 7 can be surprising because the 
definition of the Volterra operator does not involve 7t.] 
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® 
Chapter 11 ome 


Fourier Analysis 


This chapter uses Hilbert space theory to motivate the introduction of Fourier coeffi- 
cients and Fourier series. The classical setting applies these concepts to functions 
defined on bounded intervals of the real line. However, the theory becomes easier and 
cleaner when we instead use a modern approach by considering functions defined on 
the unit circle of the complex plane. 

The first section of this chapter shows how consideration of Fourier series leads us 
to harmonic functions and a solution to the Dirichlet problem. In the second section 
of this chapter, convolution becomes a major tool for the L? theory. 

The third section of this chapter changes the context to functions defined on the 
real line. Many of the techniques introduced in the first two sections of the chapter 
transfer easily to provide results about the Fourier transform on the real line. The 
highlights of our treatment of the Fourier transform are the Fourier Inversion Formula 
and the extension of the Fourier transform to a unitary operator on L?(R). 

The vast field of Fourier analysis cannot be completely covered in a single chapter. 
Thus this chapter gives readers just a taste of the subject. Readers who go on from 
this chapter to one of the many book-length treatments of Fourier analysis will then 
already be familiar with the terminology and techniques of the subject. 


a 


The Giza pyramids, near where the Battle of Pyramids took place in 1798 during 
Napoleon’s invasion of Egypt. Joseph Fourier (1768-1830) was one of the scientific 
advisors to Napoleon in Egypt. While in Egypt as part of Napoleon’s invading force, 

Fourier began thinking about the mathematical theory of heat propagation, which 
eventually led to what we now call Fourier series and the Fourier transform. 
CC-BY-SA Ricardo Liberato 
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11A Fourier Series and Poisson Integral 


Fourier Coefficients and Riemann—Lebesgue Lemma 


For k € Z, suppose e,: (—7t, 7] + R is defined by 


Th sin(kt) if k > 0, 
11.1 ex(t) = § ae ifk =0, 
a cos(kt) ifk <0. 


The classical theory of Fourier series features {e;,};¢z as an orthonormal basis of 
L?((—7, mt). The trigonometric formulas displayed in Exercise | in Section 8C can 
be used to show that {e; },cz is indeed an orthonormal family in L? ((—7, 71]). 

To show that {e,},<z is an orthonormal basis of L?((—7, 7] ) requires more 
work. One slick possibility is to note that the Spectral Theorem for compact operators 
produces orthonormal bases; an appropriate choice of a compact normal operator 
can then be used to show that {e;,},¢z is an orthonormal basis of L?((—71, 7t]) [see 
Exercise 11(c) in Section 10D]. 

In this chapter we take a cleaner approach to Fourier series by working on the unit 
circle in the complex plane instead of on the interval (—7t, 71]. The map 


112 try ef =cost+isint 


can be used to identify the interval (—7z, 7t] with the unit circle; thus the two ap- 
proaches are equivalent. However, the calculations are easier in the unit circle context. 
In addition, we will see that the unit circle context provides the huge benefit of 
making a connection with harmonic functions. 

We begin by introducing notation for the open unit disk and the unit circle in the 
complex plane. 


11.3 Definition D; 0D 


e D denotes the open unit disk in the complex plane: 
Dj oee lol 
e OD is the unit circle in the complex plane: 


oD — {22 C|2|— 1), 


The function given in 11.2 is a one-to-one map of (—71, 71] onto dD. We use 
this map to define a g-algebra on 0D by transferring the Borel subsets of (—71, 71] 
to subsets of dD that we will call the measurable subsets of 0D. We also transfer 
Lebesgue measure on the Borel subsets of (71, 71] to a measure called 7 on the 
measurable subsets of 0D, except that for convenience we normalize by dividing 
by 277 so that the measure of 0D is 1 rather than 27t. We are now ready to give the 
formal definitions. 
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e A subset E of dD is measurable if {t € (—7, 71] : e € E} isa Borel subset 
of R. 


e cis the measure on the measurable subsets of dD obtained by transferring 
Lebesgue measure from (—71, 7t] to AD, normalized so that 7(0D) = 1. In 
other words, if E C OD is measurable, then 


pelt Sad ee E} 


Our definition of the measure 7 on 0D allows us to transfer integration on dD to 
the familiar context of integration on (—71, 7t]. Specifically, 


[far= [ Fea) = [" re) = 


for all measurable functions f : 0D — C such that any of these integrals is defined. 
Throughout this chapter, we assume that the scalar field F is the complex field C. 
Furthermore, L? (dD) is defined as follows. 


For 1 < p < o¢, define L?(9D) to mean the complex version (F = C) of L?(c). 


Note that if z = e! for some t € R, then Z = e7# = 1 and z” = e!” and 
zit — et for all n € Z. These observations make the proof of the next result 
much simpler than the proof of the corresponding result for the trigonometric family 
defined by 11.1. 

In the statement of the next result, z” means the function on dD defined by z > z”. 


11.6 orthonormal family in L? (0D) 


{z"\ ez is an orthonormal family in L?(0D). 


Proof Ifn € Z, then 
ae) =[. |z""|* do(z) =[. lor= 1, 


If m,n € Z withm # n, then 


~ , ._. dt T dt ei(m—n)t 4t=n 
m .n\ __ imt,—int Cr __ i(m—n)t = = 
eee le © On Ie 2x = i(m—n)27Jlt=—-n : 


as desired. 


In the next section, we improve the result above by showing that {z”},,cz is an 
orthonormal basis of L*(@D) (see 11.30). 
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Hilbert space theory tells us that if f is in the closure in L?(0D) of span{z"} nez, 


then 
f= Vif 2")2", 


neZ 


where the infinite sum above converges as an unordered sum in the norm of L*(@D) 
(see 8.58). The inner product (f,z") above equals 


z)z" do(z). 
F(z) doz) 
Because |z"| = 1 for every z € OD, the integral above makes sense not only for 
f € L*(AD) but also for f in the larger space L'(0D). Thus we make the following 
definition. 

az Definition Fourier coefficient 

Suppose f € L1(dD). 


e Forn € Z, the n"™ Fourier coefficient of f is denoted f (n) and is defined by 


fn) =f flezarte) = f” pleermt =. 


e The Fourier series of f is the formal sum 


As we will see, Fourier analysis helps describe the sense in which the Fourier 
series of f represents f. 


11.8 Example Fourier coefficients 


e Suppose /i is an analytic function on an open set that contains D. Then ht has a 
power series representation 


h(z) = » Anz", 
n=0 


where the sum on the right converges uniformly on D to h. Because uniform 
convergence on dD implies convergence in L? (dD), 8.58(b) and 11.6 now imply 
that 

a, ifn>0, 

0 ifn<0O 


(hlap) (#) = 


for all n € Z. In other words, for functions analytic on an open set containing 


D, the Fourier series is the same as the Taylor series. 
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e Suppose f: dD — R is defined by 


Zz 
3 
_ Ze Zz" —. (Z)" 
(5 Lo ae) 
| oz 
=3 2 In|’ 


for all n € Z. 


We begin with some simple algebraic properties of Fourier coefficients, whose 
proof is left to the reader. 


11.9 algebraic properties of Fourier coefficients 


Suppose f,¢ € L!(9D) and n € Z. Then 


(a) (Ff +8)(m) =f(n) +8(n); 
(n 


(b) (af)*(n) = af 
(c) |f(n)| < fll. 


Parts (a) and (b) above could be restated by saying that for each n € Z, the 
function f ++ f (7) is a linear functional from L'(@D) to C. Part (c) could be 
restated by saying that this linear functional has norm at most 1. 

Part (c) above implies that the set of Fourier coefficients { bi (n)}nez is bounded 
for each f € L'(0D). The Fourier coefficients of the functions in Example 11.8 
have the stronger property that limy—++.0 ri (n) = 0. The next result shows that this 
stronger conclusion holds for all functions in L'(0D). 


) for all w € C; 
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11.10 Riemann-Lebesgue Lemma 


Suppose f € L'(0D). Then lim f(n) =0. 
n =r0o 


Proof Suppose ¢ > 0. There exists g € L*(@D) such that || f — g||) < ¢ (by 3.44). 
By 11.6 and Bessel’s inequality (8.57), we have 


Y 18)? s Iisa <0. 


n=—oo 


Thus there exists M € Z* such that |¢(1)| < ¢ forall n € Z with |n| > M. Now if 
n € Zand |n| > M, then 


LF(n)| < [F(n) — 8(n)| + 18 )| 
<|(f-g) (| +e 
< |lf-slive 
< 2e. 


Thus lim f (1) =0. 


Poisson Kernel 


Suppose f : dD — C is continuous and z € OD. For this fixed z € 0D, the Fourier 
series 

ie a 

dL f(a)” 

n=—0o 

is a series of complex numbers. It would be nice if f(z) = V°__., f(1)z", but this 
is not necessarily true because the series >)" ,, f (n)z" might not converge, as you 
can see in Exercise 11. 

Various techniques exist for trying to assign some meaning to a series of complex 
numbers that does not converge. In one such technique, called Abel summation, the 
n'*-term of the series is multiplied by r” and then the limit is taken as r ¢ 1. For 
example, if the n'-term of the divergent series 


leak 


is multiplied by r” for r € [0,1), we get a convergent series whose sum equals eee 
Taking the limit of this sum as r t 1 then gives 5 as the value of the Abel sum of the 
series above. 

The next definition can be motivated by applying a similar technique to the Fourier 
series ) oo Fi (n)z". Here we have a series of complex numbers whose terms are 
indexed by Z rather than by Z*. Thus we use rll rather than r” because we want 
these multipliers to have limit 0 as n —+ +o for each r € [0,1) (and to have limit 1 
as r + 1 foreach n € Z). 
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Definition 


For f € L1(0D) and 0 < r < 1, define P,f: 0D > C by 


foe) 


CG 8 ie 


n=—oo 


No convergence problems arise in the series above because 


[Pl A(m)z"| < fla el 


for each z € 0D, which implies that 


foe} 
Ye [rlP()2"| < Iflnzee <o. 
n=—Oo 
Thus for each r € (0, 1), the partial sums of the series above converge uniformly on 
OD, which implies that P, f is a continuous function from dD to C (for r = 0 and 
n = O, interpret the expression 0° to be 1). 
Let’s unravel the formula in 11.11. If f € L} (0D),0 <r <1,andz € OD, then 


2 rll F(n)z" 


n=—oo 


(Prf) (2) 


= yo ll i: f (w)z0" do(w)z" 


n=—oo 


11.12 


f(w)( rl" (z)") dow), 

oD n=—oo 

where interchanging the sum and integral above is justified by the uniform conver- 
gence of the series on 0D. To evaluate the sum in parentheses in the last line above, 


let € € OD (think of ¢ = zw in the formula above). Thus (C)~” = (¢)” and 


foe} 


yt let = Srey + » ()" 


n=—oo n=0 


— 1 re 
i-v 'i-# 


re a ea 24 
7 [b—ae? 


1—r 


1143 =a 
=a? 


Motivated by the formula above, we now make the following definition. Notice 
that 11.11 uses calligraphic P, while the next definition uses italic P. 
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| 11.14 Definition 


e For 0 <r <1, define P,: dD — (0,00) by 


1=7 


P(C) = [tree 


e The family of functions {P,},<¢{9,1) is called the Poisson kernel on D. 


Combining 11.12 and 11.13 now gives the following result. 
11.15 integral formula for P,.f 


If f € L'(aD), 0 <r <1, andz € OD, then 


J —1? 


(Prf)(2) = J, F(w)P,(z) dow) =f feo) = dow). 


The terminology approximate identity is sometimes used to describe the three 
properties for the Poisson kernel given in the next result. 


11.16 properties of P, 
(a) P,(Z) > 0 for all r € [0,1) and all  € oD. 


(b) he P,(€) do(Z) = 1 for each r € 0,1). 


(c) fn (ee P,(€) do(Z) = 0 for each 6 > 0. 


Proof Part (a) follows immediately from the definition of P,(Z) given in 11.14. 
Part (b) follows from integrating the series representation for P, given by 11.13 
termwise and noting that 


a. dt eint t= 
n dco = / eint _ 
oats) 


iat om = aioe = 0 for all n € Z \ {0}; 


t=—7 


for n = 0, we have f5, 6" do(Z) = 1. 
To prove part (c), suppose 6 > 0. If £ € 9D, |1— Z| > 6, and 1—r < §, then 


[leet = [Dat (real) e| 
> Lt) hr) 
4 
Thus as r f 1, the denominator in the definition of P,(C) is uniformly bounded away 
from 0 on {@ € AD: |1 — | > 5} and the numerator goes to 0. Thus the integral of 
P, over {C € OD: |1—C| > 5} goes toOasr t 1. 
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Here is the intuition behind the proof of the 
next result: Parts (a) and (b) of the previous re- 
sult and 11.15 mean that (P;f)(z) is a weighted 
average of f. Part (c) of the previous result says 
that for r close to 1, most of the weight in this 
weighted average is concentrated near z. Thus 
(P,f)(2) + f(z) ast 1. 

The figure here transfers the context from 
dD to (—7, 7]. The area under both curves 
is 27 [corresponding to 11.16(b)] and P,(e“*) 
becomes more concentrated near t = Oasr 1 
[corresponding to 11.16(c)]. See Exercise 3 for Pra 
the formula for P,(e’*). 

One more ingredient is needed for the next _, nt 
proof: If h € L'(dD) and z € OD, then 


P34 


The graphs of Pi (e!*) [red] and 
11.17 | (220) do(w) = i SHUG) date). P3 (el) fbluel on (72,71. 


The equation above holds because the measure ¢ is rotation and reflection invariant. 
In other words, o({w € OD : h(zw) € E}) = o({f € OD: h(Z) € E}) for all 
measurable E Cc 0D. 


Proof Suppose ¢ > 0. Because f is uniformly continuous on 0D, there exists 6 > 0 
such that 


|f (z) — f(w)| < e for all z,w € OD with |z — w| < 6. 
If z € OD, then 


f@) — (PLA) =|F@) — ff) Pw) ao(w)| 
=| [ (F@ — Fw) Pr (2) do(w)| 


< P(x 
se = :|z—w|<d} (zw) do(w) 


+2llflleo f P, (zt) do-(w) 


{wedD: |z—w|>d} 
< co r ’ 
Se +2 J apy. qinny POO 47) 


where we have used 11.17, 11.16(a), and 11.16(b); the last line uses the equality 
|z — w| = |1 — |, which holds when = zw. Now 11.16(c) shows that the value 
of the last integral above has uniform (with respect to z € dD) limit 0 as r + 1. 
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Solution to Dirichlet Problem on Disk 


As a bonus to our investigation into Fourier series, the previous result provides the 
solution to the Dirichlet problem on the unit disk. To state the Dirichlet problem, we 
first need a few definitions. As usual, we identify C with R2. Thus for x,y € R, we 
can think of w = x + yi € Corw = (x,y) € R*. Hence 


D={weC: |e) <1} ={(xy) eR? +x 4y <1}. 


For a function f: G — C on an open subset G of C (or an open subset G of 
R?), the partial derivatives D; f and D> f are defined as in 5.46 except that now we 
allow f to be a complex-valued function. Clearly Djf = Dj(Re f) +iDj(Im f) for 
j= 1,2; 


11.19 Definition harmonic function 


A function u: G + C onan open subset G of R? is called harmonic if 


(D1(Dif)) (w) + (D2(D2f)) (w) = 0 


for all w € G. The left side of the equation above is called the Laplacian of f at 
w and is often denoted by (Af) (w). 


11.20 Example harmonic functions 


e If f: G — Cis an analytic function on an open set G C C, then the functions 
Re f, Im f, f, and F are all harmonic functions on G, as is usually discussed 
near the beginning of a course on complex analysis. 


e If € OD, then the function 
1— |w/? 
[1 = fw/? 
is harmonic on C \ {€} (see Exercise 7). 
e The function u: C \ {0} — R defined by u(w) = log|w| is harmonic on 


C \ {0}, as you should verify. However, there does not exist a function f 
analytic on C \ {0} such that u = Re f. 


The Dirichlet problem asks to extend a continuous function on the boundary of an 
open subset of R? to a function that is harmonic on the open set and continuous on 
the closure of the open set. Here is a more formal statement: 


Dirichlet problem on G: Suppose G C R? is an open set and 
11.21 f: dG — Cis a continuous function. Find a continuous function 
u: G — C such that u|g is harmonic and ulgg = f. 


For some open sets G C R2, there exist continuous functions f on dG whose 
Dirichlet problem has no solution. However, the situation on the open unit disk D is 
much nicer, as we will soon see. 
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The function u defined in the result below is called the Poisson integral of f on D. 


11.22 Poisson integral is harmonic 


Suppose f € L!(@D). Define vu: D > C by 


u(rz) = (Prf)(z) 


for r € [0,1) and z € OD. Then u is harmonic on D. 


Proof Ifw € D, then w = rz for some r € [0,1) and some z € 0D. Thus 


u(w) = (Prf)(2) 


= yf)" + 5 oe 


n=0 n=1 


Every function that has a power series representation on D is analytic on D. Thus 
the equation above shows that u is the sum of an analytic function and the complex 
conjugate of an analytic function. Hence u is harmonic. 


11.23 Poisson integral solves Dirichlet problem on unit disk 


Suppose f: JD — C is continuous. Define u: D — C by 


if0 <r<1landz €oD, 
ifr =landz € 0D. 


Then u is continuous on D, u|p is harmonic, and ulap = f. 


Proof Suppose € € dD. To prove that u is continuous at €, we need to show that 
if w € D is close to €, then u(w) is close to u(Z). Because ulap = f and f is 
continuous on 0D, we do not need to worry about the case where w € OD. Thus 
assume w € D. We can write w = rz, where r € [0,1) and z € dD. Now 


|u(G) — u(w)| = FC) — (Prf)(2)| 
< If) — FE) + IF) — (PAZ). 


If w is close to C, then z is also close to ¢, and hence by the continuity of f the first 
term in the last line above is small. Also, if w is close to @, then r is close to 1, and 
hence by 11.18 the second term in the last line above is small. Thus if w is close to €, 
then u(w) is close to u(C), as desired. 

The function “|p is harmonic on D (and hence continuous on D) by 11.22. 

The definition of u immediately implies that u]ap = f. 
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Fourier Series of Smooth Functions 


The Fourier series of a continuous function on 0D need not converge pointwise (see 
Exercise 11). However, in this subsection we will see that Fourier series behave well 
for functions that are twice continuously differentiable. 

First we need to define what we mean 
for a function on 0D to be differen- 
tiable. The formal definition is given be- 
low, along with the introduction of the no- 
tation f for the transfer of f to (—7t, 71] 
and f|*l for the transfer back to 0D of the k'"-derivative of f. 


The idea here is that we transfer a 
function defined on 0D to (—71, 71], 


take the usual derivative there, then 
transfer back to 0D. 


11.24 Definition ; k times continuously differentiable; f [k] 


Suppose f : JD — C is a complex-valued function on dD and k € Zt U {0}. 
e Define f: R > Cby f(t) = f(e"). 


e f is called k times continuously differentiable if 3 is k times differentiable 
everywhere on R and its k'®-derivative f (K) : R — Cis continuous. 


e If f is k times continuously differentiable, then f!*]: 3D — C is defined by 


fle) = fH) 


fort € R. Here i: (0) is defined to be tf which means that f [0] — if 


Note that the function f defined above is periodic on R because f(t+27) = f(t) 
for all t € R. Thus all derivatives of f are also periodic on R. 


11.25 Example Suppose € Z and f: dD — C is defined by f(z) = z”. Then 
f: R= Cis defined by f(t) = e'"". 
Ifk € Z*, then f(t) = Knkeit, Thus fll (z) = iknkz" for z € aD. 


Our next result gives a formula for the Fourier coefficients of a derivative. 


11.26 Fourier coefficients of differentiable functions 


Suppose k € Z* and f: dD — C isk times continuously differentiable. Then 


for every n € Z. 


Proof First suppose n = 0. By the Fundamental Theorem of Calculus, we have 
t=7 
Ky-(9 =|" [K] (git —={" (k) (4) GE p(k-1) A] 0 
(yo = fe) =f MOS =f VO] =o. 
which is the desired result for n = 0. 
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Now suppose n € Z \ {0}. Then 


(FBC) = f° FO (emt 


27 


1» f=7 dt 
— _~_ f(k-1) al (k= iG ewint 
Oe in [ f on 


= in(f*-4)-(n), 


where the second equality above follows from integration by parts. 
Iterating the equation above now produces the desired result. 


Now we can prove the beautiful result that a twice continuously differentiable func- 
tion on dD equals its Fourier series, with uniform convergence of the Fourier series. 
This conclusion holds with the weaker hypothesis that the function is continuously 
differentiable, but the proof is easier with the hypothesis used here. 


11.27 Fourier series of twice continuously differentiable functions converge 


Suppose f: dD — C is twice continuously differentiable. Then 


M 
for all z € OD. Furthermore, the partial sums a f(n)z” converge uniformly 
i 


on 0D to f as K,M — oo. 


APY) SF 


n2 — 2 


11.28 |f(n)| = 


where the equality above follows from 11.26 and the inequality above follows 
from 11.9(c). Now 11.28 implies that 


11.29 ¥ lz"|= YS lf) <0 


n=—oo n=—oo 


for all z € 9D. The inequality above implies that 1° _,, f(1)z" converges and that 
the partial sums converge uniformly on dD. 
Furthermore, for each € 0D we have 


fle) =tim Yo rfl" = Yo fla 


where the first equality holds by 11.18 and 11.11, and the second equality holds by 
the Dominated Convergence Theorem (use counting measure on Z) and 11.29. 
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In 1923 Andrey Kolmogorov (1903-1987) published a proof that there exists 
a function in L! (OD) whose Fourier series diverges almost everywhere on 0D. 
Kolmogorov’s result and the result in Exercise 11 probably led most mathematicians 
to suspect that there exists a continuous function on dD whose Fourier series diverges 
almost everywhere. However, in 1966 Lennart Carleson (1928—) showed that if 
f € L?(aD) (and in particular if f is continuous on dD), then the Fourier series of f 
converges to f almost everywhere. 


EXERCISES 11A 


1 Prove that (f)*(n) = f(—n) for all f € L1(0D) and all n € Z. 
2 Suppose 1 < p < wandn € Z. 


(a) Show that the function f ++ f (1) is a bounded linear functional on L? (D) 
with norm 1. 


(b) Find all f € L?(dD) such that || f||, = 1 and |f()| = 1. 


3 Show that if0 <r < landt € R, then 


1—r? 


P, (et) = : 
r(e") 1—2rcost+r2 


4 Suppose f € L1(dD), z € AD, and f is continuous at z. Prove that 
lim(P,f)(z) = f(z). 
rae 


[Here £L'(8D) means the complex version of L'(c). The result in this exercise 
differs from 11.18 because here we are assuming continuity only at a single 
point and we are not even assuming that f is bounded, as compared to 11.18, 
which assumed continuity at all points of 0D.] 


5 Suppose f € £1(dD), z € OD, lim f (ez) =a, and lim f (e"2) = b. Prove 
t t 
that 


: a+b 
Tenet ae. 


Lf a # }, then f is said to have a jump discontinuity at Z.] 


6 Prove that for each p € [1,00), there exists f € L'(D) such that 


7 Suppose ¢ € 0D. Show that the function 
dl 
ft — Col? 


is harmonic on C \ {¢} by finding an analytic function on C \ {¢} whose real 
part is the function above. 


8 


10 


11 
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Suppose f : dD — R is the function defined by 


f(x,y) = xty 


for (x,y) € R* with x* + y? = 1. Find a polynomial u of two variables x, y 
such that u is harmonic on R? and u|ap = f. 

[Of course, u|p is the Poisson integral of f. However, here you are asked to 
find an explicit formula for u in closed form, without involving or computing 
an integral. It may help to think of f as defined by f(z) = (Rez)*(Imz) for 
z€0D.] 


Find a formula (in closed form, not as an infinite sum) for P, f, where f is the 
function in the second bullet point of Example 11.8. 


Suppose f : 0D — C is three times continuously differentiable. Prove that 


for all z € OD. 


Let C(dD) denote the Banach space of continuous function from 0D to C, with 
the supremum norm. For M € Z*, define a linear functional gy: C(AD) + C 
by 


M na 
= Li fn) 
n=—M oe 


Thus yy (f) is a partial sum of the Fourier series )° f(n)z", evaluated at 
z=1. n=—0o 


(a) Show that 
sin See 1)t dt 
p=f. Fle sin 5 2m 
for every f € C(AD) and every M € Z*. 
(b) Show that 


TU 


sin(M + 5)t) dt 
lim | 4 
M0 J —7 sin 5 2 


(c) Show that limyy-+00||gml| = ©. 


M 
(d) Show that there exists f € C(dD) such that Jim » f(n) does not 
exist (as an element of C). n=—M 


[Because the sum in part (d) is a partial sum of the Fourier series evaluated at 
z = 1, part (d) shows that the Fourier series of a continuous function on 0D 
need not converge pointwise on 0D. 
The family of functions (one for each M € Z*) on dD defined by 
4. sin(M+5)t 
On sin 5 


is called the Dirichlet kernel. ] 
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Define f: DD + R by 


1 if Imz > 0, 
f(z) =< -1 if Imz <0, 
0 if Imz =0. 


(a) Show that ifn € Z, then 


A —— ifnisodd 
n)= nit 2 
Fn) f if n is even. 
(b) Show that 
2 2r Imz 
(Prf)(z) = | arctan 5——5 


for every r € [0,1) and every z € OD. 
(c) Verify that lim, (P;f)(z) = f(z) for every z € oD. 
(d) Prove that P; f does not converge uniformly to f on oD. 
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11B Fourier Series and L? of Unit Circle 


The last paragraph of the previous section mentioned the result that the Fourier series 
of a function in L?(dD) converges pointwise to the function almost everywhere. This 
terrific result had been an open question until 1966. Its proof is not included in this 
book, partly because the proof is difficult and partly because pointwise convergence 
has turned out to be less useful than norm convergence. 

Thus we begin this section with the easy proof that the Fourier series converges 
in the norm of L?(@D). The remainder of this section then concentrates on issues 
connected with norm convergence. 


Orthonormal Basis for L? of Unit Circle 


We already showed that {z”},¢z is an orthonormal family in L?(@D) (see 11.6). 
Now we show that {z"},¢z is an orthonormal basis of L?(0D). 


11.30 orthonormal basis of L?(dD) 


The family {z”},,¢z is an orthonormal basis of L?(dD). 


Proof Suppose f € (span{z”}ncz) *. Thus (f,z") = 0 for all n € Z. In other 
words, f (1) = 0 for all n € Z. 

Suppose ¢ > 0. Let g: 0D — C be a twice continuously differentiable function 
such that || f — ¢||2 < e. [To prove the existence of g € L?(0D) with this property, 
first approximate f by step functions as in 3.47, but use the L?-norm instead of the 
L!-norm. Then approximate the characteristic function of an interval as in 3.48, but 
again use the L?-norm and round the corners of the graph in the proof of 3.48 to get a 
twice continuously differentiable function. ] 


Now 
Illa < lif — gl + llele 
/ 
=Iif-sla+ (Time) 
=If-sh+(Zle-N@P) 
< lif —gllz+ lls - fll 


< 2e, 


where the second line above follows from 11.27, the third line above holds because 
f (n) = 0 for all n € Z, and the fourth line above follows from Bessel’s inequality 
(8.57). 

Because the inequality above holds for all ¢ > 0, we conclude that f = 0. We 
have now shown that (span{z"}nez)~ = {0}. Hence span{z"}ncz = L?(0D) 
by 8.42, which implies that {z”},,cz is an orthonormal basis of L?(dD). 
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Now the convergence of the Fourier series of f € L?(0D) to f follows immedi- 
ately from standard Hilbert space theory [see 8.63(a)] and the previous result. Thus 
with no further proof needed, we have the following important result. 


11.31 convergence of Fourier series in the norm of L” (dD) 


Suppose f € L?(0D). Then 


where the infinite sum converges to f in the norm of L?(0D). 


The next example is a spectacular ap- 
plication of Hilbert space theory and the 
orthonormal basis {z"},cz of L?(@D). 
The evaluation of (?°_, a had been an 
open question until Euler discovered in 


Euler’s proof, which would not be 
considered sufficiently rigorous by 


today’s standards, was quite 
different from the technique used in 
the example below. 


1734 that this infinite sum equals i 


a a m0 
11.32 Example 2 72 32 SE 7 
Define f € L?(0D) by f (ei) = ¢ fort € (—7, 7]. Then f(0) = f"_ t 4 = 


For n € Z \ {0}, we have 


a ue int at 
= t —int * 
f(n) i: 7 2 


= ia ) N 1 i. pint dE 
—i2anlt=-n | inden 270 


where the second line above follows from integration by parts. The equation above 
implies that 


foe) ml foe) 1 
11.33 - F(n)|? =2 Yo. 
=—0o n=1 
Also, 
dt rr 
11.34 aot ie Po =. 
fla = J m3 


Parseval’s identity [8.63(c)] implies that the left side of 11.33 equals the left side of 
11.34. Setting the right side of 11.33 equal to the right side of 11.34 shows that 
y" 1 7 
nz 6° 


n=1 
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Convolution on Unit Circle 
Recall that 
11.35 (Prf)(z) = apd (H)Pr(2B) do(w) 


for f € L'(0D), 0<r<1,andz € AD (see 11.15). The kind of integral formula 
that appears in the result above is so useful that it gets a special name and notation. 


11.36 Definition convolution; f * g 


Suppose f,¢ € L!(0D). The convolution of f and g is denoted f * g and is the 
function defined by 


(Fe a)(z) =f F(w)g (2) do(w) 


for those z € 0D for which the integral above makes sense. 


Thus 11.35 states that P,f = f * P,. Here f € L!(dD) and P, € L° (dD); hence 
there is no problem with the integral in the definition of f * P, being defined for all 
z € OD. See Exercise 11 for an interpretation of convolution when the functions are 
transferred to the real line. 

The definition above of the convolution of two functions allows both functions to 
be in L'(0D). The product of two functions in L'(@D) is not, in general, in L'(0D). 
Thus it is not obvious that the convolution of two functions in L'(dD) is defined 
anywhere. However, the next result shows that all is well. 


11.37 convolution of two functions in L'(0D) is in L!(0D) 


If f,g € L'(0D), then (f * g)(z) is defined for almost every z € dD. Further- 
more, f * g € L1(AD) and ||f * gli < [Ifll1 Ilgll- 


Proof Suppose f,g¢ € £L'(dD). The function (w,z) ++ f(w)g(z@) is a measur- 
able function on dD x OD, as you are asked to show in Exercise 4. Now Tonelli’s 
Theorem (5.28) and 11.17 imply that 


{, fewtem ato at) = re f tem arene 
=/ |f(w)lilgll dow) 


dD 


= |lflla Ws. 


The equation above implies that [5,|f(w)g(zw)|do(w) < oo for almost every 
z € OD. Thus (f * g)(Z) is defined for almost every z € OD. 
The equation above also implies that || f * g|]1 < || fla | gII1- 


Soon we will apply convolution results to Poisson integrals. However, first we 
need to extend the previous result by bounding || f * g|| when g € LP(dD). 
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11.38 L?-norm of a convolution 


Suppose 1 < p < 0, f € L'(AD), and g € L?(OD). Then 


If * slp < IIfllt Isllp- 


Proof We use the following result to estimate the norm in L?(dD): 


If F: OD — Cis measurable and 1 < p < oo, then 


1139 IEllp = sup{ [ |Fh| do: h € L?(AD) and ||h|| 7 = 1}.- 
D 


Ho6lder’s inequality (7.9) shows that the left side of the equation above is greater 
than or equal to the right side. The inequality in the other direction almost follows 
from 7.12, but 7.12 would require the hypothesis that f € L? (dD) (and we want the 
equation above to hold even if || f|| = cc). To get around this problem, apply 7.12 
to truncations of F and use the Monotone Convergence Theorem (3.11); the details 
of verifying 11.39 are left to the reader. 

Suppose h € L?’(AD) and ||h||,,, = 1. Then 


lp 
[ ls) @)h@)| do(z) < _ neni do(z) 
(w)| [| |g(z)h(z)| do(@) dow) 
< Jouanies dow) 


11.40 = If lla 


where the second line above follows from Tonelli’s Theorem (5.28) and the third line 
follows from Hélder’s inequality (7.9) and 11.17. Now 11.39 (with F = f * g) and 


11.40 imply that || f * gllp < |Iflla Ilgllp- 


Order does not matter in convolutions, as we now prove. 


11.41 convolution is commutative 


Suppose f,¢ € L!(dD). Then f +g = ¢ f. 


Proof Suppose z € OD is such that (f * g)(z) is defined. Then 


(F< 8)(@) =f Flw)g(ew) do(w) =f fleB)g(Z) dol) = (¢* A). 


where the second equality follows from making the substitution ¢ = zw (which 
implies that w = z€); the invariance of the integral under this substitution is explained 
in connection with 11.17. 
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Now we come to a major result, stating that for p € [1, 00), the Poisson integrals 
of functions in L?(¢D) converge in the norm of L?(dD). This result fails for p = 00 
[see, for example, Exercise 12(d) in Section 11A]. 


11.42 if f © L?(OD), then P,f converges to f in L?(0D) 


Suppose 1 < p < coand f € L?(dD). Then Lia Uy — Prf lp = 0. 
ie 


Proof Suppose ¢ > 0. Let g: dD — C be acontinuous function on dD such that 


lf -gllp <e. 
By 11.18, there exists R € [0,1) such that 
IIS — Prglloo <e€ 
for all r € (R,1). Ifr € (R,1), then 
If — Prfllp < lf — sllp + ll — Prallp + \lPrs — Prf lip 
<ét|lg — Prglleo + Il Pr(g — f)llp 
< 2e+ ||P, * (g—f)Ilp 


< 2e + ||Prlla lig — f lp 
< 3e, 


where the third line above is justified by 11.41, the fourth line above is justified by 
11.38, and the last line above is justified by the equation ||P, ||; = 1, which follows 
from 11.16(a) and 11.16(b). The last inequality implies that lim||f — Prf \lp = 0. 

r 


As a consequence of the result above, we can now prove that functions in L!(dD), 
and thus functions in L?(dD) for every p € [1,00], are uniquely determined by 
their Fourier coefficients. Specifically, if ¢,h € L'(dD) and (n) = h(n) for every 
n € Z, then applying the result below to g — h shows that g = h. 


11.43 functions are determined by their Fourier coefficients 


Suppose f € L'(9D) and f(n) = 0 for every n € Z. Then f = 0. 


Proof Because P;f is defined in terms of Fourier coefficients (see 11.11), we know 
that P,f = 0 for allr € [0,1). Because Pf — f in L'(AD) as r + 1 [by 11.42]), 
this implies that f = 0. 


Our next result shows that multiplication of Fourier coefficients corresponds to 
convolution of the corresponding functions. 
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11.44 Fourier coefficients of a convolution 


Suppose f,¢ € L!(dD). Then 


(f « 3) (n) = (1) $(n) 


for every n € Z. 


Proof First note that if w € 0D and n € Z, then 
11.45 | g(zw)z" do(z) = ‘| e(Z)Z*w" do(Z) = w7G(n), 
oD oD 


where the first equality comes from the substitution € = zw (equivalent to z = Cw), 
which is justified by the rotation invariance of c. 
Now 


(Fs) (n) =f (fg) (z)2" doz) 


dD 


= ae at e820) do(w) do(z) 


-— f(w) [3 (eo)? ao(z) do(w) 
= [_ Fo}" a(n) dow) 


= f(n) §(n), 
where the interchange of integration order in the third equality is justified by the same 
steps used in the proof of 11.37 and the fourth equality above is justified by 11.45. 


The next result could be proved by appropriate uses of Tonelli’s Theorem and 
Fubini’s Theorem. However, the slick proof technique used in the proof below should 
be useful in dealing with some of the exercises. 


11.46 convolution is associative 


Suppose f,¢,h € L!(dD). Then (f *g) xh = f * (g*h). 


Proof Suppose n € Z. Using 11.44 twice, we have 


((f #3) «h) (2) = (Fg) (a)h(n) = f(n) 


oq 
—*, 
a 
~" 
= 
— 
a 


Similarly, 

(f *(g*h))(n) = f(n)(g*h)(n) = f(n)g(n)h(n). 
Hence (f * ¢) *h and f * (¢*h) have the same Fourier coefficients. Because 
functions in L'(dD) are determined by their Fourier coefficients (see 11.43), this 


implies that (f * g) *h = f *(g*h). 
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EXERCISES 11B 


1 


10 


Show that the family {e;,},<z of trigonometric functions defined by 11.1 is an 
orthonormal basis of L?((—7r, 7t]). 


Use the result of Exercise 12(a) in Section 11A to show that 
ee ee wc 


ie 


foe) 
Use techniques similar to Example 11.32 to evaluate ss; = 
n=1 
[If you feel industrious, you may also want to evaluate )\y_, 1/n°. Similar 
techniques work to evaluate °°, 1/ nk for each positive even integer k. You can 
become famous if you figure out how to evaluate -?-_4 1/n3, which currently is 
an open question. | 


Suppose f,g: dD —> C are measurable functions. Prove that the function 
(w,z) ++ f(w)g(z@) is a measurable function from dD x OD to C. 

[Here the o-algebra on 0D x OD is the usual product o-algebra as defined in 
5.2.] 


Where does the proof of 11.42 fail when p = c0? 


Suppose f € L'(dD). Prove that f is real valued (almost everywhere) if and 
only if f(—n) = f(n) for every n € Z. 


Suppose f € L!(dD). Show that f € L?(0D) ifandonly if )° |f(n)|? < ©. 


n=—0o 


Suppose f € L?(0D). Prove that | f(z)| = 1 for almost every z € OD if and 
only if 


for all n € Z. 
For this exercise, for each r € [0,1) think of P; as an operator on 12 (dD). 
(a) Show that P, is a self-adjoint compact operator for each r € [0,1). 


(b) For eachr € (0, 1), find all eigenvalues and eigenvectors of P,. 
(c) Prove or disprove: lim, ||I — P;|| = 0. 


Suppose f € L!(0D). Define T: L?(0D) — L?(0D) by Tg = f *g. 

(a) Show that T is a compact operator on L?(0D). 

(b) Prove that T is injective if and only if f(n) 4 0 for every n € Z. 

(c) Find a formula for T*. 

(d) Prove: T is self-adjoint if and only if all Fourier coefficients of f are real. 


(e) Show that T is a normal operator. 
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Show that if f,¢ € L'(dD) then 


)= fo Feogt—x) ax 


for those tf € R such that (f « ¢)(e’) makes sense; here (f * ¢)”, f, and & 
denote the transfers to the real line as defined in 11.24. 


Suppose 1 < p < oo. Prove that if f € L?(9D) and g € L?’ (AD), then f *g 
is a continuous function on 0D. 


Suppose g¢ € L!(dD) is such that ¢(7) 4 0 for infinitely many n € Z. Prove 
that if f € L1(0D) and f x g = g, then f = 0. 


Show that there exists a two-sided sequence ...,b_2,b_,b9,b,b2,... such 
that lim by, =0 but there does not exist f € L'(0D) with f(1) = by for all 
n— 00 


neEZ. 


Prove that if f,g € L*(dD), then 
YL fsa - 
k=—0o 
for every n € Z. 
Suppose f € L!(dD). Prove that P;(Psf) = Prsf for all r,s € [0,1). 


Suppose p € [1,00] and f € L?(dD). Prove that if0 <r <s <1, then 


|Prfllp < llPsfllp- 


Prove Wirtinger’s inequality: If f: R — R is a continuously differentiable 
27-periodic function and [”_ f(t) dt = 0, then 


[voyas [i ya 


=70 =F 


with equality if and only if f(t) = asin(t) + bcos(t) for some constants a, b. 
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11C Fourier Transform 


Fourier Transform on L!(R) 


We now switch from consideration of functions defined on the unit circle dD to 
consideration of functions defined on the real line R. Instead of dealing with Fourier 
coefficients and Fourier series, we now deal with Fourier transforms. 

Recall that [ ee f(x) dx means Te f dA, where A denotes Lebesgue measure 
on R, and similarly if a dummy variable other than x is used (see 3.39). Similarly, 
L?(R) means L?(A) (the version that allows the functions to be complex valued). 


fllp = (SSc|F (x)? ax)'”? for 1 < p <0. 


Thus in this section, 


11.47 Definition Fourier transform 


For f € L!(R), the Fourier transform of f is the function f : R —> C defined by 


OSI Or ee 


We use the same notation f for the Fourier transform as we did for Fourier 
coefficients. The analogies that we will see between the two concepts makes using 
the same notation reasonable. The context should make it clear whether this notation 
refers to Fourier transforms (when we are working with functions defined on R) 
or whether the notation refers to Fourier coefficients (when we are working with 
functions defined on dD). 

The factor 27c that appears in the exponent in the definition above of the Fourier 
transform is a normalization factor. Without this normalization, we would lose the 
beautiful result that ||f||2 = || f||z (see 11.82). Another possible normalization, 
which is used by some books, is to define the Fourier transform of f at t to be 


[fee aaa 


There is no right or wrong way to do the normalization—pesky 7t’s will pop up 
somewhere regardless of the normalization or lack of normalization. However, the 
choice made in 11.47 seems to cause fewer problems than other choices. 


11.48 Example Fourier transforms 
(a) Suppose b < c. If t € R, then 


c . 
(Xi,q) (4) =| grids 


i (ue = ea) 
= 27t 


c—b ift =0. 


ift £0, 
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(b) Suppose f(x) = e727! for x € R. Ift € R, then 
foH= [. e 27 |x| g—2ritx gy 


0 : (os) : 
_ i, e272 p—27itx dx +f e 27x p—2mitx dx 
—o00 0 


7 i 1 
27n(1—it) ' 27(1+it) 

1 

~ me(t#2 +1)" 


Recall that the Riemann—Lebesgue Lemma on the unit circle 0D states that if 
feE L1(dD), then limn— too f(n) = 0 (see 11.10). Now we come to the analogous 
result in the context of the real line. 


11.49 Riemann-Lebesgue Lemma 


Suppose f € L!(R). Then iG is uniformly continuous on R. Furthermore, 


Ill <|Iflh and tim f(t) =0. 


Proof Because jena = 1 forall t € R and all x € R, the definition of the 
Fourier transform implies that if t € R then 


OLS [- |f@)lae = fll. 


Thus | loo < [fh 

If f is the characteristic function of a bounded interval, then the formula in 
Example 11.48(a) shows that f is uniformly continuous on R and lim;_,+.0 f (t) =0. 
Thus the same result holds for finite linear combinations of such functions. Such 
finite linear combinations are called step functions (see 3.46). 

Now consider arbitrary f € L!(R). There exists a sequence f;, f2,... of step 
functions in L!(R) such that limg_500||f — fxl|1 = 0 (by 3.47). Thus 


lim ||f — falloo = 0. 
—00 


In other words, the sequence fi fo ... converges uniformly on R to f . Because the 
uniform limit of uniformly continuous functions is uniformly continuous, we can 
conclude that 7 is uniformly continuous on R. Furthermore, the uniform limit of 
functions on R each of which has limit 0 at +00 also has limit 0 at too, completing 
the proof. 


The next result gives a condition that forces the Fourier transform of a function to 
be continuously differentiable. This result also gives a formula for the derivative of 
the Fourier transform. See Exercise 8 for a formula for the n' derivative. 
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11.50 derivative of a Fourier transform 


Suppose f € L!(R). Define g: R + C by g(x) = xf(x). If g € L!(R), then 
if is a continuously differentiable function on R and 


(f)'(#) = —27ig(¢) 


for allt E R. 


Proof Fixt € R. Then 


F( 


pote) =70 = tim [~ flax aa 


s—0 Ss s—0 s 


e 7 27tIsx 


[foe (ig) a 


a xf (eo dx 


= 20182), 


where the second equality is justified by using the inequality |e!? —1] < @ (valid 
for all @ € R, as the reader should verify) to show that |(e~?7"5* — 1)/s| < 27t|x| 
for all s € R\ {0} and all x € R; the hypothesis that xf(x) € L!(R) and the 
Dominated Convergence Theorem (3.31) then allow for the interchange of the limit 
and the integral that is used in the second equality above. 

The equation above shows that f is differentiable and that (f)!(t) = —27ci¢(t) 
for all t € R. Because ¢ is continuous on R (by 11.49), we can also conclude that f 
is continuously differentiable. 


11.51 Example eo nx equals its Fourier transform 


Suppose f € L!(R) is defined by f(x) = e-7™° Then the function gi: ROC 
defined by g(x) = xf(x) = xe~™ isin L} (R). Hence 11.50 implies that if t € R 
then 


(f)'(t) = -2ni | ye T* enix de 


—o 
§ ee as xX=00 <a 
= (ie TX" p a) | -2nt f eX 9 27citx dx 
x=—00 —oo 


11.52 = —2ntf(t), 


where the second equality follows from integration by parts (if you are nervous about 
doing an integration by parts from —oo to ov, change each integral to be the limit as 
M — o9 of the integral from —M to M). 
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Note that f’(t) = ~2nte™ — —27t f(t). Combining this equation with 11.52 
shows that 


(£)'w = £00 ~FOFO _ FOAO-FOFO _ 

f (F(#))” (F(#))” 

for all tf € R. Thus 7 / f is a constant function. In other words, there exists c € C 
such that ii = cf. To evaluate c, note that 


11.53 f(0) = / e-™ dy = 1 = f(0), 
where the integral above is evaluated by writing its square as the integral times the 
same integral but using y instead of x for the dummy variable and then converting to 
polar coordinates (dx dy = r dr dé). 

Clearly 11.53 implies that c = 1. Thus f — 4 


The next result gives a formula for the Fourier transform of a derivative. See 
Exercise 9 for a formula for the Fourier transform of the n' derivative. 


11.54 Fourier transform of a derivative 


Suppose f € L!(R) is a continuously differentiable function and f’ € L'(R). 


If t € R, then 


(f') (t) = 2nitf (t). 


Proof Suppose ¢ > 0. Because f and f’ are in L'(R), there exists a € R such that 


[lf @lar<e and |f(a)| <e. 
Now if b > a then 
b co 
KOl=|f fadr+ fa] s [I @lax+ fal <2. 


Hence limy—yoo f(x) = 0. Similarly, limy——0o f(x) = 0. 
If t € R, then 


(FY) = fo fee ax 
— cae ll + 27it [. fine" dz 


= 2nitf(t), 


where the second equality comes from integration by parts and the third equality 
holds because we showed in the paragraph above that limy—.+00 f(x) = 0. 


The next result gives formulas for the Fourier transforms of some algebraic 
transformations of a function. Proofs of these formulas are left to the reader. 


Section 11C Fourier Transform 367 


11.55 Fourier transforms of translations, rotations, and dilations 


Suppose f € L1(R),b € R,andt ER. 


(a) If g(x) = f(x — b) for all x € R, then g(t) = e~27"# F(t). 


(b) If g(x) = e2#* f(x) for all x € R, then $(t) = f(t — b). 


(c) Ifb AO and g(x) = f(bx) for all x € R, then $(t) = 


11.56 Example Fourier transform of a rotation of an exponential function 


Suppose y > 0, x € R, and h(t) = e~27¥l"le27t, To find the Fourier transform 
of h, first consider the function g defined by g(t) = e~27¥l4l. By 11.48(b) and 
11.55(c), we have 


11.57 j= : 22 
: 2 2 2° 
Yn((E+1) Pty 


Now 11.55(b) implies that 


x 1 y 
11.58 (= Gant 


note that x is a constant in the definition of h, which has t as the variable, but x is the 
variable in 11.55(b)—this slightly awkward permutation of variables is done in this 
example to make a later reference to 11.58 come out cleaner. 


The next result will be immensely useful later in this section. 


11.59 integral of a function times a Fourier transform 


Suppose f, ¢ € L'(R). Then 


| fsta= [ fegte) at 


Proof Both integrals in the equation above make sense because f, g € L!(R) and 
f ,& € L™(R) (by 11.49). Using the definition of the Fourier transform, we have 


[. Ff(t)g(t) dt = fs Lee ege 
= [f@) J s(pe?n dt dx 


= [7 Fate) ar. 


where Tonelli’s Theorem and Fubini’s Theorem justify the second equality. Changing 
the dummy variable x to t in the last expression gives the desired result. 
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Convolution on R 


Our next big goal is to prove the Fourier Inversion Formula. This remarkable formula, 
discovered by Fourier, states that if f € L'(R) and f € L1(R), then 


11.60 ides f- fhe at 


for almost every x € R. We will eventually prove this result (see 11.76), but first we 
need to develop some tools that will be used in the proof. To motivate these tools, we 
look at the right side of the equation above for fixed x € R and see what we would 
need to prove that it equals f(x). 

To get from the right side of 11.60 to an expression involving f rather than a , we 
should be tempted to use 11.59. However, we cannot use 11.59 because the function 
tr» e27X! is not in L} (R), which is a hypothesis needed for 11.59. Thus we throw 
in a convenient convergence factor, fixing y > 0 and considering the integral 


fo] ‘1 
11.61 / f (the 27 yl ert a. 
—oo 


The convergence factor above is a good choice because for fixed y > 0 the function 
try e~27Ylt is in L1(R), and limy|o e~27ylt| — 1 for every t € R (which means 
that 11.61 may be a good approximation to 11.60 for y close to 0). 

Now let’s be rigorous. Suppose f € L'(R). Fix y > 0 and x € R. Define 
h: R > Cby h(t) = e~27l4e2*#. Then h € L!(R) and 


[fede rrvtleest at = [len (t) at 
= [sei dt 


ne y 
11.62 -7 | fOgupag® 


where the second equality comes from 11.59 and the third equality comes from 11.58. 
We will come back to the specific formula in 11.62 later, but for now we use 11.62 as 
motivation for study of expressions of the form [°., f(t)g(x — £) dt. Thus we have 
been led to the following definition. 


11.63 Definition convolution; f * ¢ 


Suppose f,g: R — C are measurable functions. The convolution of f and g is 
denoted f * g and is the function defined by 


(Fes)ix) =f flg(e—#) at 


for those x € R for which the integral above makes sense. 
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Here we are using the same terminology and notation as was used for the convolu- 
tion of functions on the unit circle. Recall that if F,G € L'(0D), then 


. 1 ; fon ke cel 
(F «G)(e) = / F(e*)G(el6-8)) & 
= 270 
for 6 € R (see 11.36). The context should always indicate whether f * g denotes 
convolution on the unit circle or convolution on the real line. The formal similarities 
between the two notions of convolution make many of the proofs transfer in either 
direction from one context to the other. 
If f,g € L'(R), then f * g is defined 
P 
for almost every x € R, and furthermore HF a See e (R), and 
If *glla < [Ifllaligil: @s you should | &¢ EF (R), then Holder's 
verify by translating the proof of 11.37 to | “equality (7.9) and the translation 
the context of R). invariance of Lebesgue measure 
If p € (1,co], then neither L1(R) nor | !™P/Y (f * 8) (x) és defined for all 
LP(R) is a subset of the other [unlike the | * © R and IIf * 8|loo < |Ifllp Isl pr 
imclision LP (aD) C L1(aD)}. Thus we (more is true; with these hypothesis, 
do not yet know that f « g makes sense f * g is a uniformly continuous 
for f € L} (R) and g € L?(R). However, function on R, as you are asked to 
the next result shows that all is well. show in Exercise 10). 


11.64 L?-norm of a convolution 


Suppose 1 < p < 00, f € L1(R), and g € L?(R). Then (f * g)(x) is defined 


for almost every x € R. Furthermore, 


IIf * slp < [Iflli Isllp- 


Proof First consider the case where f(x) > 0 and g(x) > 0 for almost every 
x € R. Thus (f * g)(x) is defined for each x € R, although its value might equal oo. 
Apply the proof of 11.38 to the context of R, concluding that || f * g||p < |[f lla Ilgllp 
[which implies that (f * g)(x) < oo for almost every x € R]. 

Now consider arbitrary f € L'(R), and g € L?(R). Apply the case of the 
previous paragraph to |f| and |¢| to get the desired conclusions. 


The next proof, as is the case for several other proofs in this section, asks the 
reader to transfer the proof of the analogous result from the context of the unit circle 
to the context of the real line. This should require only minor adjustments of a proof 
from one of the two previous sections. The best way to learn this material is to write 
out for yourself the required proof in the context of the real line. 


11.65 convolution is commutative 


Suppose f,g: R — C are measurable functions and x € R is such that 
(f * g)(x) is defined. Then (f * g)(x) = (g * f) (x). 


Proof Adjust the proof of 11.41 to the context of R. 
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Our next result shows that multiplication of Fourier transforms corresponds to 
convolution of the corresponding functions. 


11.66 Fourier transform of a convolution 


Suppose f,g € L'(R). Then 


(fea) @) = fH s®) 


foreveryt ER. 


Proof Adjust the proof of 11.44 to the context of R. 


Poisson Kernel on Upper Half-Plane 


As usual, we identify R? with C, as illustrated in the following definition. We will 
see that the upper half-plane plays a role in the context of R similar to the role that 
the open unit disk plays in the context of dD. 


11.67 Definition H; upper half-plane 
e H denotes the open upper half-plane in R?: 
Hafiz 9) eR 27 20) — te CC: Ime = 0}. 
e OH is identified with the real line: 


OH Ain) eR yO) = 422 C-ims— 0) — Rh 


Recall that we defined a family of functions on 0D called the Poisson kernel on D 
(see 11.14, where the family is called the Poisson kernel on D because 0 < r < 1 and 
¢ € dD implies rf € D). Now we are ready to define a family of functions on R that 
is called the Poisson kernel on H [because x € R and y > 0 implies (x,y) € H]. 

The following definition is motivated by 11.62. The notation P, for the Poisson 
kernel on D and P, for the Poisson kernel on H is potentially ambiguous (what is 
P, /2?), but the intended meaning should always be clear from the context. 


11.68 Definition P,; Poisson kernel 


e For y > 0, define Py: R — (0,00) by 


e The family of functions {Py}y>0 is called the Poisson kernel on H. 
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The properties of the Poisson kernel on H listed in the result below should be 
compared to the corresponding properties (see 11.16) of the Poisson kernel on D. 


11.69 properties of P, 


(a) P,(x) > 0 forall y > Oandallx €R. 


(b) / Py(x) dx = 1 for each y > 0. 


(c) lim | P,(x) dx = 0 for each 6 > 0. 
yl0 J{xER:|x|>5} 


Proof Part (a) follows immediately from the definition of P,(x) given in 11.68. 
Parts (b) and (c) follow from explicitly evaluating the integrals, using the result 
that for each y > 0, an anti-derivative of P,(x) (as a function of x) is i arctan 7 


If p € [1,00] and f € LP(R) and y > 0, then f * P, makes sense because 
Py e€ LP'(R). Thus the following definition makes sense. 


11.70 Definition P,f 
For f € L?(R) for some p € [1,00] and for y > 0, define Py f : R + C by 
By he af Fr eeerety ie eeu Ae 
(Puf\e)= f fOR-a=— f fOgapppe 


for x € R. In other words, Pyf = f * Py. 


The next result is analogous to 11.18, 
except that now we need to include in the 
hypothesis that our function is uniformly 
continuous and bounded (those conditions 
follow automatically from continuity in 
the context of the unit circle). 

For the proof of the result below, you 
should use the properties in 11.69 instead 
of the corresponding properties in 11.16. 


When Napoleon appointed Fourier 
to an administrative position in 
1806, Siméon Poisson (1781-1840) 
was appointed to the professor 
position at Ecole Polytechnique 
vacated by Fourier. Poisson 
published over 300 mathematical 
papers during his lifetime. 


11.71 if f is uniformly continuous and bounded, then Mal f —Pyflloo = 0 
y. 


Suppose f : R — C is uniformly continuous and bounded. Then Py f converges 
uniformly to f on Ras y | 0. 


Proof Adjust the proof of 11.18 to the context of R. 
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The function u defined in the result below is called the Poisson integral of f on H. 


11.72 Poisson integral is harmonic 


Suppose f € L?(R) for some p € [1,00]. Define uv: H — C by 


u(x,y) = (Pyf) (x) 


for x € Rand y > 0. Then u is harmonic on H. 


Proof First we consider the case where f is real valued. For x € R and y > 0, let 
zZ=x-+iy. Then 
y 1 
(x —t)* + y? 2 


for t € R. Thus 


u(x,y) =—Im= |” F(e 


The oe Zhe — ke f( HL ; df is analytic on H; its derivative is the function 


z—t 


Zh i. ol Oca dt fateh cation for this statement is in the next paragraph). 
In other words, we can differentiate (with respect to z) under the integral sign in 
the expression above. Because wu is the imaginary part of an analytic function, u is 
harmonic on H, as desired. 

To justify the eran under the ay sign, fix z € H and define a 
_ H — Cby g(z) = — f(t) dt. Then 


ee offi a= [f0 Gh ew Gowan 


As w —> 2Z, the function t ae goes to 0 in the norm of L?’(R). Thus 
Je. s mn, Sa 9) and the equation above imply that g’(z) exists and that 


=f". ans dt, as desired. 


We have now solved the Dirichlet problem on the half-space for uniformly contin- 
uous, bounded functions on R (see 11.21 for the statement of the Dirichlet problem). 


11.73 Poisson integral solves Dirichlet problem on half-plane 


Suppose f: R — C is uniformly continuous and bounded. Define u: H-C 
by 


_ J (Pyf)(x) ifx e Randy > 0, 
ay) = | if x € Randy =0. 


Then u is continuous on H, u|jq is harmonic, and uly = f. 


Proof Adjust the proof of 11.23 to the context of R; now you will need to use 11.71 
and 11.72 instead of the corresponding results for the unit circle. 
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The next result, which states that the 
Poisson integrals of functions in L?(R) 
converge in the norm of L?(R), will be 
a major tool in proving the Fourier Inver- 
sion Formula and other results later in this 
section. 

For the result below, the proof of the corresponding result on the unit circle (11.42) 
does not transfer to the context of R (because the inequality ||-||) < ||-||.o fails in the 
context of R). 


Poisson and Fourier are two of the 
72 mathematicians/scientists whose 


names are prominently inscribed on 
the Eiffel Tower in Paris. 


11.74 if f © L?(R), then P,f converges to f in L?(R) 


Suppose 1 < p < coand f € L?(R). Then all = Pyfillp =0. 
UA 


Proof Ify >Oandx € R, then 


Lf() — (Pyf)(2)| = |F(%) - : Fle #)Py(t) a 


=|["¢ f(x —#))Py(t) a 


/p 


11.75 ue f(x —#)|PP,(t) (sat): ; 


where the inequality comes from applying 7.10 to the measure Py dt (note that the 
measure of R with respect to this measure is 1). 
Define h: R — [0,00) by 


n(t) =f [fe — fe)? ae, 


Then /1 is a bounded function that is uniformly continuous on R [by Exercise 23(a) in 
Section 7A]. Furthermore, 1(0) = 0. 

Raising both sides of 11.75 to the o power and then integrating over R with 
respect to x, we have 


Wf—Pufllh < ff [Fe — FHP) at ax 
caPeedates 
[a 
= (Pyh)(0). 


Now 11.71 implies that ery (0) = h(0) = 0. Hence the last inequality above 
y. 
implies that ne _ Pyf lp —0. 
y: 
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Fourier Inversion Formula 


Now we can prove the remarkable Fourier Inversion Formula. 


11.76 Fourier Inversion Formula 


Suppose f € L1(R) and f € L'(R). Then 


OS hoes 


for almost every x € R. In other words, 


f(x) = (f)"(-*) 


for almost every x € R. 


Proof Equation 11.62 states that 


11.77 a F(t)e27vlele2mixt ae = (Py f)(x) 


for every x € Rand every y > 0. 

Because f € L!(R), the Dominated Convergence Theorem (3.31) implies that for 
every x € R, the left side of 11.77 has limit (f)°(—x) as y | 0. 

Because f € L'(R), 11.74 implies that lim, || — Pyf||1 = 0. Now 7.23 im- 
plies that there is a sequence of positive numbers yj, /2,... such that limy— oo Yn = 0 
and limy—co(Py, f)(x) = f(x) for almost every x € R. 

Combining the results in the two previous paragraphs and equation 11.77 shows 
that f(x) = (f)°(—x) for almost every x € R. 


The Fourier transform of a function in L'(R) is a uniformly continuous function on 
R (by 11.49). Thus the Fourier Inversion Formula (11.76) implies that if f € ie (R) 
and 7 € L1(R), then f can be modified on a set of measure zero to become a 
uniformly continuous function on R. 

The Fourier Inversion Formula now allows us to calculate the Fourier transform 
of Py for each y > 0. 


11.78 Example Fourier transform of Py 
Suppose y > 0. Define f: R > (0,1] by 
fis eo 2yltl 


Then f = P, by 11.57. Hence both f and f are in L'(R). Thus we can apply the 
Fourier Inversion Formula (11.76), concluding that 


nN 


11.79 (Py)(x) = (PY (x) = f(x) = Hh 


for allx ER. 
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Now we can prove that the map on L!(R) defined by f +> f is one-to-one. 


11.80 functions are determined by their Fourier transforms 


Suppose f € £1(R) and f(t) = 0 for every t € R. Then f = 0. 


Proof Because f = 0, we also have (f)° = 0. The Fourier Inversion Formula 
(11.76) now implies that f = 0. 


The next result could be proved directly using the definition of convolution and 
Tonelli’s/Fubini’s Theorems. However, the following cute proof deserves to be seen. 


11.81 convolution is associative 


Suppose f,¢,h € L!(R). Then (f *¢) *h = f *(g*h). 


Proof The Fourier transform of (f * g) * h and the Fourier transform of f * (g * h) 
both equal f. oh (by 11.66). Because the Fourier transform is a one-to-one mapping 
on L1(R) [see 11.80], this implies that (f « g) *h = f * (g*h). 

Extending Fourier Transform to L7(R) 

We now prove that the map f +> f preserves L7(R) norms on L!(R) MN L?(R). 


11.82 Plancherel’s Theorem: Fourier transform preserves L?(R) norms 


Suppose f € L1(R) 1 L?(R). Then ||f||2 = || fl2. 


Proof First consider the case where if € L!(R) in addition to the hypothesis that 
f € L'(R)ML?(R). Define g: R > C by g(x) = f(—x). Then ¢(t) = f(t) for 
all tf € R, as is easy to verify. Now 


WAR = fF) FC) ax 


=f" F-nFra ae 

11.83 = AY ge) ax 
11.84 - % F (x) $(x) dx 
= [| f@fejar 
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where 11.83 holds by the Fourier Inversion Formula (11.76) and 11.84 follows from 
11.59. The equation above shows that our desired result holds in the case when 
f = TAR). 

Now consider arbitrary f € L'(R)  L?(R). If y > 0, then f * Py € L1(R) by 
11.64. If x € R, then 


(f * Py)°(x) = f(x) (Py)"(x) 
11.85 = f(x)e27¥*1, 


where the first equality above comes from 11.66 and the second equality comes from 
11.79. The equation above shows that (f * Py)” € L'(R). Thus we can apply the 
first case to f * Py, concluding that 


IIf * Pylla = IICf * Py)"lle. 


As y | 0, the left side of the equation above converges to || f||2 [by 11.74]. As y | 0, 
the right side of the equation above converges to || f || [by the explicit formula for 
f* Py given in 11.85 and the Monotone Convergence Theorem (3.11)]. Thus the 
equation above implies that || f ||2 = || fll2. 


Because L!(R) M L?(R) is dense in L*(R), Plancherel’s Theorem (1 1.82) allows 
us to extend the map f ++ f uniquely to a bounded linear map from L?(R) to L?(R) 
(see Exercise 14 in Section 6C). This extension is called the Fourier transform on 


L?(R); it gets its own notation, as shown below. 


The Fourier transform F on L?(R) is the bounded operator on L?(R) such that 
Ff =f forall f € L1(R)NL7(R). 


For f € L'(R)ML?(R), we can use either f or Ff to denote the Fourier 
transform of f. But if f € L'(R) \ L?(R), we will use only the notation f, and if 
f € L?(R) \ L!(R), we will use only the notation F f. 

Suppose f € L*(R) \ L!(R) and t € R. Do not make the mistake of thinking 
that (Ff) (t) equals 


[. fee" dx. 


Indeed, the integral above makes no sense because | f(x)e~?™!*| = | f(x)| and 
f € L'(R). Instead of defining F f via the equation above, F f must be defined as the 
limit in L?(R) of (f,)*, (f2)*,-.., where fi, fo,... is a sequence in L!(R) MN L?(R) 
such that 

lf — fnll2 + Oasn > ov. 


For example, one could take fy, = FxXn because If =i yjll2 — Oasn — oo 


yn 


by the Dominated Convergence Theorem (3.31). 
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Because F is obtained by continuously extending [in the norm of L?(R)] the 
Fourier transform from L!(R) M L?(R) to L?(R), we know that || Ff ||2 = ||f 2 for 
all f € L*(R). In other words, F is an isometry on L?(R). The next result shows 
that even more is true. 


11.87 properties of the Fourier transform on L?(R) 


(a) F is a unitary operator on L?(R). 


I 
(Peale) Sse la 


Proof First we prove (b). Suppose f € L'(R) 1 L?(R). If y > 0, then Py € L1(R) 
and hence 11.64 implies that 


11.88 fx Py €L'(R)NL(R). 
Also, 
11.89 (f * Py)” € L1(R)NL?(R), 


as follows from the equation (f * Py)” = f- (P,)° [see 11.66] and the observation 
that f € L°(R), (Py)* € L*(R) [see 11.49 and 11.79] and the observation that 
f © L2(R), (Py)* € L®(R) [see 11.82 and 11.49}. 

Now the Fourier Inversion Formula (11.76) as applied to f * Py (which is valid by 
11.88 and 11.89) implies that 


Ff +P) =f % By 


Taking the limit in L?(R) of both sides of the equation above as y { 0, we have 
F*f = f (by 11.74), completing the proof of (b). 

Plancherel’s Theorem (11.82) tells us that F is an isometry on LP(R). Part (a) 
implies that F is surjective. Because a surjective isometry is unitary (see 10.61), we 
conclude that F is unitary, completing the proof of (a). 

The Spectral Mapping Theorem [see 10.40—take p(z) = z4] and (b) imply that 
a* = 1 for each w € sp(T). In other words, sp(T) C {1,i,—1, —i}. However, 1, i, 
—1, —i are all eigenvalues of F (see Example 11.51 and Exercises 2, 3, and 4) and 
thus are all in sp(T). Hence sp(T) = {1,i, —1, —i}, completing the proof of (c). 


EXERCISES 11C 


1 Suppose f € L!(R). Prove that IF lloo = ||f\l1 if and only if there exists 
¢ € OD andt € R such that 2 f (x)e~"™ > 0 for almost every x € R. 


2 Suppose f(x) = xe~™ for all x € R. Show that f = —if. 
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Suppose f(x) = A4rx2e-™ — e- 7 for all x € R. Show that f = —f. 
Find f € L!(R) such that f A O and f = if. 


Prove that if p is a polynomial on R with complex coefficients and f: R + C 
is defined by f(x) = p(x)je-™, then there exists a polynomial q on R with 


complex coefficients such that degq = deg p and f(t) = q(t)e-™ for all 
teR. 


Suppose 
—2"x ; 
xe ifx > 0, 
fa)= 
0 ifx <0. 
Show that f(t) : forallf ER 
ow tha = —~——_,, fora F 
An? (1 + it)? 
Prove the formulas in 11.55 for the Fourier transforms of translations, rotations, 


and dilations. 

Suppose f € L!(R) andn € Z*. Define g: R > C by g(x) = x" f(x). Prove 

that if ¢ € L1(R), then f is 1 times continuously differentiable on R and 
(fF) (t) = (—270i)"B(f) 

forallf eR. 


Suppose n € Z* and f € L!(R) is n times continuously differentiable and 
f™ © L'(R). Prove that if t € R, then 


(f)°(t) = rit)" F(P). 


Suppose 1 < p < 00, f € LP(R), and g € L?(R). Prove that f * gis a 
uniformly continuous function on R. 


Suppose f € £L°(R), x € R, and f is continuous at x. Prove that 


lim(Pyf)(x) = f(x). 


y0 


Suppose p € [1,co] and f € L?(R). Prove that Py(Py f) = P. 


y+y'f for all 
y,y’ > 0. 


Suppose p € [1,00] and f € L?(R). Prove that if 0 < y < y’, then 
|Pyf lly = Py fllp- 


Suppose f € L!(R). 


(a) Prove that (f)*(t) = f(—t) forall t € R. 


(b) Prove that f(x) € R for almost every x € R if and only if f(t) = f(—t) 
forallf ER. 


15 


16 
17 


18 


19 


20 
21 
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Define f € L'(R) by f(x) =e-* x, _.(x). Show that f ¢ L1(R). 


[0, 00) 

Suppose f € L!(R) and f € L'(R). Prove that f € L?(R) and f € L?(R). 

Prove there exists a continuous function g: R — R such that dim =o 
— 00 

andg ¢ {f:f © L'(R)}. 


Prove that if f € L'(R), then ||f||2 = || f|l2. 

[This exercise slightly improves Plancherel’s Theorem (11.82) because here we 
have the weaker hypothesis that f € L'(R) instead of f € L'(R) NL?(R). 
Because of Plancherel’s Theorem, here you need only prove that if f € 1} (R) 
and ||f||2 = 09, then || f||2 = 09.] 


Suppose y > 0. Define on operator T on L?(R) by Tf = f * Py. 


(a) Show that T is a self-adjoint operator on L7(R). 
(b) Show that sp(T) = [0,1]. 


[Because the spectrum of each compact operator is a countable set (by 10.93), 
part (b) above implies that T is not a compact operator. This conclusion differs 


from the situation on the unit circle—see Exercise 9 in Section 11B.] 


Prove that if f € L1(R) and g € L?(R), then F(f * ¢) =f - Fe. 
Prove that f,¢ € L*(R), then (fg)* = (Ff) « (Fg). 
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Probability Measures 


Probability theory has become increasingly important in multiple parts of science. 
Getting deeply into probability theory requires a full book, not just a chapter. For 
readers who intend to pursue further studies in probability theory, this chapter gives 
you a good head start. For readers not intending to delve further into probability 
theory, this chapter gives you a taste of the subject. 

Modern probability theory makes major use of measure theory. As we will see, a 
probability measure is simply a measure such that the measure of the whole space 
equals 1. Thus a thorough understanding of the chapters of this book dealing with 
measure theory and integration provides a solid foundation for probability theory. 

However, probability theory is not simply the special case of measure theory where 
the whole space has measure 1. The questions that probability theory investigates 
differ from the questions natural to measure theory. For example, the probability 
notions of independent sets and independent random variables, which are introduced 
in this chapter, do not arise in measure theory. 

Even when concepts in probability theory have the same meaning as well-known 
concepts in measure theory, the terminology and notation can be quite different. Thus 
one goal of this chapter is to introduce the vocabulary of probability theory. This 
difference in vocabulary between probability theory and measure theory occurred 
because the two subjects had different historical developments, only coming together 
in the first half of the twentieth century. 


ee 


\ 
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Dice used in games of chance. The beginning of probability theory can be traced to 

correspondence in 1654 between Pierre de Fermat (1601-1665) and Blaise Pascal 

(1623-1662) about how to distribute fairly money bet on an unfinished game of dice. 
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Probability Spaces 


We begin with an intuitive and nonrigorous motivation. Suppose we pick a real 
number at random from the interval (0,1), with each real number having an equal 
probability of being chosen (whatever that means). What is the probability that the 
chosen number is in the interval (2. 1)? The only reasonable answer to this question 
is be More generally, if I, In,... is a disjoint sequence of open intervals contained 
in (0,1), then the probability that our randomly chosen real number is in UF_ In 
should be )(°_; €(In), where ¢(I) denotes the length of an interval I. Still more 
generally, if A is a Borel subset of (0,1), then the probability that our random number 
is in A should be the Lebesgue measure of A. 

With the paragraph above as motivation, we are now ready to define a probability 
measure. We will use the notation and terminology common in probability theory 
instead of the conventions of measure theory. 

In particular, the set in which everything takes place is now called instead of 
the usual X in measure theory. The o-algebra on (O is called F instead of S, which 
we have used in previous chapters. Our measure is now called P instead of . This 
new notation and terminology can be disorienting when first encountered. However, 
reading this chapter should help you become comfortable with this notation and 
terminology, which are standard in probability theory. 


12.1 Definition probability measure 


Suppose ¥ is a c-algebra on a set O. 


e A probability measure on (QF) is a measure P on (QO, F) such that 
Aoy= 1 


is called the sample space. 


An event is an element of F (F need not be mentioned if it is clear from the 
context). 


If A is an event, then P(A) is called the probability of A. 


If P is a probability measure on (O,, F), then the triple (0, F, P) is called a 
probability space. 


12.2 Example probability measures 


e Suppose n € Z* and ( is a set containing exactly n elements. Let F denote 
the collection of all subsets of O. Then 


counting measure on () 


n 


is a probability measure on (QO, F). 
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e As a more specific example of 
the previous item, suppose that 
Q = {40,41,...,49} and P = 
(counting measure on Q)/10. Let 
A = {w € 0: wiseven} and 
B = {w € ©: wisprime}. Then P(A) [which is the probability that an 
element of this sample space Q is even] is 5 and P(B) [which is the probability 


This example illustrates the common 
practice in probability theory of 


using lower case Ww to denote a 
typical element of upper case . 


that an element of this sample space Q is prime] is b- 


e Let A denote Lebesgue measure on the interval [0,1]. Then A is a probability 
measure on ([0,1], 8), where 6 denotes the a-algebra of Borel subsets of [0,1]. 


Let A denote Lebesgue measure on R, and let B denote the o-algebra of Borel 
subsets of R. Define h: R — (0,00) by h(x) = a ee Then hdd isa 
probability measure on (R, B) [see 9.6 for the definition of h dA]. 


In measure theory, we used the notation 7 , to denote the characteristic function 
of a set A. In probability theory, this function has a different name and different 
notation, as we see in the next definition. 


12.3 Definition indicator function; 1, 


If O is a set and A C OQ), then the indicator function of A is the function 
14: O > R defined by 


L4(w) - {1 fee, 
OE ENG ec eA 


The next definition gives the replacement in probability theory for measure theory’s 
phrase almost every. 


12.4 Definition almost surely 


Suppose (Q, F, P) is a probability space. An event A is said to happen almost 
surely if the probability of A is 1, or equivalently if P(Q.\ A) = 0. 


12.5 Example almost surely 


Let P denote Lebesgue measure on the interval [0,1]. If w € [0,1], then w is 
almost surely an irrational number (because the set of rational numbers has Lebesgue 
measure 0). 

This example shows that an event having probability 1 (equivalent to happening 
almost surely) does not mean that the event definitely happens. Conversely, an event 
having probability 0 does not mean that the event is impossible. Specifically, if a real 
number is chosen at random from [0,1] using Lebesgue measure as the probability, 
then the probability that the number is rational is 0, but that event can still happen. 
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The following result is frequently useful in probability theory. A careful reading of 
the proof of this result, as our first proof in this chapter, should give you good practice 
using some of the notation and terminology commonly used in probability theory. 
This proof also illustrates the point that having a good understanding of measure 
theory and integration can often be extremely useful in probability theory—here we 
use the Monotone Convergence Theorem. 


12.6 Borel—Cantelli Lemma 


Suppose (Q, F, P) is a probability space and A;, A2,... is a sequence of events 


Suichithat jo 2) (45) —<.00. Then 


P({w €O:w € Ay, for infinitely many n € Z*}) =0 


Proof Let A = {w €Q:w € Ay for infinitely many n € Z*}. Then 


Thus A € F, and hence P(A) makes sense. 
The Monotone Convergence Theorem (3.11) implies that 


f(s) )aP= yp XL fae = LPIA 


Thus )>°°_, 14,, is almost surely finite. Hence P(A) = 0. 


Independent Events and Independent Random Variables 


The notion of independent events, which we now define, is one of the key concepts 
that distinguishes probability theory from measure theory. 


12.7 Definition independent events 


Suppose (O,,F, P) is a probability space. 
e Two events A and B are called independent if 


P(AMB) = P(A) - P(B). 


e More generally, a family of events {A;},er is called independent if 


P(Ax, Mess MN Ax, ) = PA) a - P(Ax,) 


whenever kj,...,k, are distinct elements of I’. 


The next two examples should help develop your intuition about independent 
events. 
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12.8 Example independent events: coin tossing 


Suppose O = {H, cee where H and T are symbols that you can think of as 
denoting “heads” and “tails”. Thus elements of O are 4-tuples of the form 


w= (W1,W2, W3, W4), 


where each (oj is H or T. Let F be the collection of all subsets of ©, and let 
P= (counting measure on QO) / 16, as we expect from a fair coin toss. 
Let 


A= {w €0: 01 = w2 = w3 = H} and B= {w €0O:a4 = H}. 


Then A contains two elements and thus P(A) = - corresponding to probability 5 
that the first three coin tosses are all heads. Also, B contains eight elements and thus 
P(B) = i, corresponding to probability i that the fourth coin toss is heads. 

Now 

P(ANB) = % = P(A) - P(B), 
where the first equality holds because A B consists of only the one element 
(H,H, H, H) and the second equality holds because P(A) = , and P(B) = 4. 
The equation above shows that A and B are independent events. 

If we toss a fair coin many times, we expect that about half the time it will be 
heads. Thus some people mistakenly believe that if the first three tosses of a fair 
coin are heads, then the fourth toss should have a higher probability of being tails, 
to balance out the previous heads. However, the coin cannot remember that it had 
three heads in a row, and thus the fourth coin toss has probability i of being heads 
regardless of the results of the three previous coin tosses. The independence of the 
events A and B above captures the notion that the results of a fair coin toss do not 
depend upon previous results. 


12.9 Example independent events: product probability space 
Suppose (Q1, Fy, Py) and (Ox, F2, Pz) are probability spaces. Then 
(Oq x O2, Fy ® Fo, Py x Pp), 


as defined in Chapter 5, is also a probability space. 
If A € Fy and B € Fy, then (A x O2)N (OQ, x B) = A x B. Thus 


(Pi x Pr) ((A x O2) (Oy x B)) = (Pi x P2)(A x B) 

= P,(A) - Pa(B) 

= (P, x Pp)(A x Op) + (Pr x P2)(O1 x B), 
where the second equality follows from the definition of the product measure, and 
the third equality holds because of the definition of the product measure and because 
P, and P» are probability measures. 


The equation above shows that the events A x O12 and Q, x B are independent 
events in F; ® Fo. 
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Compare the next result to the Borel—Cantelli Lemma (12.6). 


12.10 relative of Borel—Cantelli Lemma 


Suppose (QO, F,P) is a probability space and {A;,},,cz+ is an independent 
family of events such that (_, P(Ay) = co. Then 


P({w € OQ: w € Ay for infinitely many n € Zt}) = 1. 


Proof Let A = {w €O:w € Ay, for infinitely many n € Z+}. Then 


12.11 Q\A= v A) (O\ An). 


m=1n=m 
If m,M € Z* are such that m < M, then 
M M 
P({\ (Q\ An)) = [] P(Q\ An) 
n=m n=m 
M 
= II (1 = P(An)) 
n=m 
1212 < e~ Liem P(An) 


where the first line holds because the family {ON \ An},¢z+ is independent (see 
Exercise 4) and the third line holds because 1 — t < e~! for all t > 0. 

Because )-7-_, P(A) = 09, by choosing M large we can make the right side of 
12.12 as close to 0 as we wish. Thus 


CO 


P({} (Q\ An)) =0 


n=m 


for all m € Z*. Now 12.11 implies that P(Q \ A) = 0. Thus we conclude that 
P(A) = 1, as desired. 


For the rest of this chapter, assume that F = R. Thus, for example, if (O, F,P)is 
a probability space, then £!(P) will always refer to the vector space of real-valued 
F -measurable functions on Q such that [(|f| dP < oe. 
12.13 Definition random variable; expectation; EX 
Suppose (O,,¥, P) is a probability space. 
e A random variable on (OQ, F) is a measurable function from O to R. 


e If X € £'(P), then the expectation (sometimes called the expected value) 
of the random variable X is denoted EX and is defined by 


EX= i XaP. 
(@) 
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If F is clear from the context, the phrase “random variable on ©” can be used 
instead of the more precise phrase “random variable on (QO, F)”. If both O and F 
are clear from the context, then the phrase “random variable” has no ambiguity and 
is often used. 

Because P(Q) = 1, the expectation EX of a random variable X € £1(P) can be 
thought of as the average or mean value of X. 

The next definition illustrates a convention often used in probability theory: the 
variable is often omitted when describing a set. Thus, for example, {X € U} means 
{w € 0: X(w) € U}, where U is a subset of R. Also, probabilists often also omit 
the set brackets, as we do for the first time in the second bullet point below, when 
appropriate parentheses are nearby. 


12.14 Definition independent random variables 


Suppose (O,,F, P) is a probability space. 


e Two random variables X and Y are called independent if {X € U} and 
{Y € V} are independent events for all Borel sets U, V in R. 


e More generally, a family of random variables {X,}xer is called independent 
if {X; © Ux}xer is independent for all families of Borel sets {U,},er in R. 


12.15 Example independent random variables 


e Suppose (OQ, F, P) is a probability space and A,B € F. Then 1, and 1, are 
independent random variables if and only if A and B are independent events, as 
you should verify. 


e Suppose O = {H, ry is the sample space of four coin tosses, with Q, and P as 
in Example 12.8. Define random variables X and Y by 
X (Ww 1, W2, W3, W4) = number of w1,W2, w3 that equal H 
and 
Y (Ww, W2, W3, W4) = number of w3, w4 that equal H. 
Then X and Y are not independent random variables because P(X = 3) = 5 
and P(Y = 0) = q but P({X = 3} N{Y =0}) = P(D) =0 4g -G. 


e Suppose (O14, Fy, P,) and (O2, Fz, P) are probability spaces, Z; is a random 
variable on Q1, and Z> is arandom variable on 2. Define random variables X 
and Y on Oy X Ox by 


X(w1,W2) = Z1(w1) and Y(w1,wW2) = Z2(w2). 


Then X and Y are independent random variables on Q4 x Oz (with respect to the 
probability measure P; x P2), as you should verify. 


If X is a random variable and f: R — R is Borel measurable, then f o X is a 
random variable (by 2.44). For example, if X is a random variable, then X? and eX 
are random variables. The next result states that compositions preserve independence. 
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12.16 functions of independent random variables are independent 


Suppose (QO, F,P) is a probability space, X and Y are independent random 


variables, and f,g: R — R are Borel measurable. Then f o X and go Y are 
independent random variables. 


Proof Suppose U, V are Borel subsets of R. Then 
P({foX eU}N{goY € V}) =P({Xe f-(U)}N{Y eg *(V)}) 
= P(Xe f-*(u)) -P(Yeg"*(V)) 


=P(foX€Uu)-PigoYeEV), 


where the second equality holds because X and Y are independent random variables. 
The equation above shows that f o X and go Y are independent random variables. 


If X,Y € L1(P), then clearly E(X + Y) = E(X) + E(Y). The next result gives 
a nice formula for the expectation of XY when X and Y are independent. This 
formula has sometimes been called the dream equation of calculus students. 


12.17 expectation of product of independent random variables 


Suppose (Q,,F, ?) is a probability space and X and Y are independent random 


variables in £?(P). Then 


E(XY) = EX - EY. 


Proof First consider the case where X and Y are each simple functions, taking 
on only finitely many values. Thus there are distinct numbers 41,...,a4\4 € Rand 
distinct numbers b1,...,bx € R such that 


x= 411, x=a)} tose Aml{x=ay} and Y= by 1ty—p} San bnlyy=oy}- 
Now 
MN M N 
XY = bP) abil x—aj}tyy—o.3 = Ly de abl Ex=ajynty=o)- 
j=lk=l1 j=lk=1 
Thus 


M N 
E(XY) = ) |) ajbkP({X = aj} 1{Y = by}) 


j=Hlk=1 
- (L a;P(X = aj)) (¢ b,P(Y = b)) 
= EX - EY, 


where the second equality above comes from the independence of X and Y. The last 
equation gives the desired conclusion in the case where X and Y are simple functions. 


388 Chapter 12 Probability Measures 


Now consider arbitrary independent random variables X and Y in LP). Let 
fi, fz,... be a sequence of Borel measurable simple functions from R to R that 
approximate the identity function on R (the function t +> ¢) in the sense that 
limyn—oo fn(t) = t for every t € R and |fn(t)| < t for all t € Rand alln € Z* 
(see 2.89, taking f to be the identity function, for construction of this sequence). The 
random variables f;, 0 X and fy, o Y are independent (by 12.17). Thus the result in 
the first paragraph of this proof shows that 


E((fn oX)(fnoY)) = E(fno X)-E(fno Y) 


for each n € Z*. The limit as n — oo of the right side of the equation above equals 
EX - EY [by the Dominated Convergence Theorem (3.31)]. The limit as 1 — co 
of the left side of the equation above equals E(XY) [use Hélder’s inequality (7.9)]. 
Thus the equation above implies that E(XY) = EX - EY. 

Variance and Standard Deviation 

The variance and standard deviation of a random variable, defined below, measure 
how much a random variable differs from its expectation. 


12.18 Definition variance; standard deviation; 7(X) 


Suppose (Q, F, P) is a probability space and X € £7(P) is a random variable. 


e The variance of X is defined to be E((X — EX)?). 


e The standard deviation of X is denoted 7(X) and is defined by 
o(X) = ,/E((X — EX)?). 


In other words, the standard deviation of X is the square root of the variance 
of X. 


The notation 77 (X) means (o(X))’. Thus o7(X) is the variance of X. 


12.19 Example variance and standard deviation of an indicator function 


Suppose (QO, F, P) is a probability space and A € F is an event. Then 


o7(14) = E((14 — El,)*) 
( 


= E((14 — P(A))?) 
= E(1,4 —2P(A)-14+ P(A)*) 
= P(A) —2(P(A))” + (P(A))” 
= P(A): (1— P(A), 


Thus o(14) = \/P(A)- (1— P(A). 
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The next result gives a formula for the variance of a random variable. This formula 
is often more convenient to use than the formula that defines the variance. 


12.20 variance formula 


Suppose (O, F, P) is a probability space and X € £L7(P) is a random variable. 
Then 


COC IO UNO 


Proof We have 


(X — EX)’) 


X?) — 2(EX)* + (EX)? 


E 
E 
E 
E(X2) — (EX), 


(X? — 2(EX)X + (EX)?) 
( 
( 
as desired. 

Our next result is called Chebyshev’s inequality. It states, for example (take t = 2 
below) that the probability that a random variable X differs from its average by more 


than twice its standard deviation is at most }. Note that P(|X — EX| > to(X)) is 
shorthand for P({w € O:: |X(w) — EX| > to(X)}). 


12.21 Chebyshev’s inequality 


Suppose (O, F, P) is a probability space and X € £L7(P) is a random variable. 
Then 


Pe Bx oO 


for all t > 0. 


Proof Suppose t > 0. Then 
P(|X — EX| > to(X)) = P(|X — EX[* > t?0(X)) 


1 
a 
~ £202(X) 


1 
2° 


E((X — EX)) 


where the second line above comes from applying Markov’s inequality (4.1) with 
h = |X — EX|? and c = #?0?(X). 


The next result gives a beautiful formula for the variance of the sum of independent 
random variables. 
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12.22 variance of sum of independent random variables 


Suppose (Q, F, P) is a probability space and X1,...,Xy € £2(P) are indepen- 
dent random variables. Then 


o7(Xy +++ + Xn) = 07(X1) +--+» +07 (Xp). 


Proof Using the variance formula given by 12.20, we have 
(3x) = 8((E xi") - (EE X0) 
= (o x?) +2E( Yo X)Xe) - (3 EX:) 


k=1 1<j<k<n 


= E(X#) — P(EX)? +2( Y E(XX))-2( LE EX)-EXe) 


k=1 k=1 1<j<k<n 1<j<k<n 


where the last equality uses 12.20, 12.17, and the hypothesis that X1,..., X, are 
independent random variables. 


Conditional Probability and Bayes’ Theorem 


The conditional probability Pg(A) that we are about to define should be interpreted 
to mean the probability that w will be in A given that w € B. Because w is in AM B 
if and only if w € B and w € A, and because we expect probabilities to multiply, it 
is reasonable to expect that 


P(B)- Pg(A) = P(AMB). 
Thus we are led to the following definition. 
12.23 Definition conditional probability; Pp 


Suppose (OQ, F,P) is a probability space and B is an event with P(B) > 0. 
Define Pg: F — [0,1] by 


_ P(AMB) 
PCB) 


Pg(A) 


If A € F, then Pg(A) is called the conditional probability of A given B. 


You should verify that with B as above, Pg is a probability measure on (QO, F). If 
A € Ff, then Pg(A) = P(A) if and only if A and B are independent events. 

We now present two versions of what is called Bayes’ Theorem. You should 
do a web search and read about the many uses of these results, including some 
controversial applications. 
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12.24 Bayes’ Theorem, first version 


Suppose (Q,,F,P) is a probability space and A,B are events with positive 


probability. Then 


Proof We have 
P(ANMB) - P(ANB)- P(A) P4(B)- P(A) 
P(B) P(A) - P(B) P(B) 


Pp(A) = 


Lg. SNBRIDGE 
< Wy, 
or £t 


+ S 
THOMAS BAYES 
1702 = 1761 . 
NonConformist minister Plaque honoring Thomas 


_ and mathematician A Bayes in Tunbridge Wells, 
Originator of the statistical 
theory of probability, the basts England. 
of most market research and CC-BY-SA Alexander Dreyer 
opinion poll techniques 


lived here 
1731 = 176) 


Oo 
ORT cENTENNY 


Suppose (O, F, P) is a probability space, B is an event with positive probability, 
and Aj,..., Ay are pairwise disjoint events, each with positive probability, such 
that Ay U---U Ay = Q. Then 


foreachk € {1,...,n}. 


Proof Consider the denominator of the expression above. We have 
n n 
12.26 d, Paj(B) - P(Aj) = PATE) = P(B), 
j= I= 


Now suppose k € {1,...,n}. Then 
_ Pa,(B)- P(Ag) Pa,(B) - P(Ag) 


Pp(Ax) P(B) a Pa,(B) “PUA;)" 


where the first equality comes from the first version of Bayes’s Theorem (12.24) and 
the second equality comes from 12.26. 
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Distribution and Density Functions of Random Variables 


For the rest of this chapter, let 6 denote the g-algebra of Borel subsets of R. 
Each random variable X determines a probability measure Py on (R, 8) anda 
function X: R — [0,1] as in the next definition. 


12.27 Definition probability distribution and distribution function; Px; X 


Suppose (O,,F, P) is a probability space and X is a random variable. 


e The probability distribution of X is the probability measure Px defined on 
(R, B) by 
Px(B) = P(X € B) = P(X71(B)). 
e The distribution function of X is the function X: R — [0,1] defined by 


X(s) = Px((—00,s]) = P(X <s). 


You should verify that the probability distribution Py as defined above is indeed a 
probability measure on (R, B). Note that the distribution function X depends upon 
the probability measure P as well as the random variable X, even though P is not 
included in the notation X (because P is usually clear from the context). 


12.28 Example probability distribution and distribution function of an indicator function 


Suppose (OQ, F, P) is a probability space and A € F is an event. Then you should 
verify that 
Pi, = (1— P(A)) 69 + P(A) 41, 


where for tf € R the measure 4; on (R, B) is defined by 


ita) = 1 iftcB, 
Me) V0 aft B. 


The distribution function of 1, is the function (14)~: R — [0,1] given by 


0 ifs <0, 
(14) (s)=41-—P(A) if0<s <1, 
1 ifs > 1, 


as you should verify. 


One direction of the next result states that every probability distribution is a right- 
continuous increasing function, with limit 0 at —co and limit 1 at co. The other 
direction of the next result states that every function with those properties is the 
distribution function of some random variable on some probability space. The proof 
shows that we can take the sample space to be (0,1), the o-algebra to be the Borel 
subsets of (0,1), and the probability measure to be Lebesgue measure on (0,1). 
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Your understanding of the proof of the next result should be enhanced by Exercise 
13, which asserts that if the function H: R — (0,1) appearing in the next result is 
continuous and injective, then the random variable X: (0,1) — R in the proof is the 
inverse function of H. 


12.29 characterization of distribution functions 


Suppose H: R -> [0,1] is a function. Then there exists a probability space 
(O, F, P) and a random variable X on (QO, F) such that H = X if and only if 
the following conditions are all satisfied: 


(a) s < t= H(s) < H(f) (in other words, H is an increasing function); 
(b) lim H(t) =0; 
i —p—CS) 


(c) lim H(t) = 1; 
too 


(d) Ju H(t) = H(s) for every s € R (in other words, H is right continuous). 
tls 


Proof First suppose H = X for some probability space (QO, F, P) and some random 
variable X on (QO, F). Then (a) holds because s < t implies (—oo,s] C (—09,f]. 
Also, (b) and (d) follow from 2.60. Furthermore, (c) follows from 2.59, completing 
the proof in this direction. 

To prove the other direction, now suppose that H satisfies (a) through (d). Let 
QO = (0,1), let F be the collection of Borel subsets of the interval (0,1), and let P 
be Lebesgue measure on ¥. Define a random variable X by 
12.30 X(w) = sup{t € R: H(t) < w} 


for w € (0,1). Clearly X is an increasing function and thus is measurable (in other 
words, X is indeed a random variable). 
Suppose s € R. If w € (0,H(s)], then 


X(w) < X(H(s)) =sup{t € R: H(t) < H(s)} <s, 


where the first inequality holds because X is an increasing function and the last 
inequality holds because H is an increasing function. Hence 


12.31 (0, H(s)] c {X < s}. 
If w € (0,1) and X(w) < s, then H(t) > w forall t > s (by 12.30). Thus 
H(s) = lim H(t) > w, 
tls 
where the equality above comes from (d). Rewriting the inequality above, we have 
w € (0,H(s)]. Thus we have shown that {X < s} C (0,H(s)], which when 
combined with 12.31 shows that {X < s} = (0, H(s)]. Hence 
X(s) = P(X <s) = P((0,H(s)] ) = H(s), 


as desired. 
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In the definition below and in the following discussion, A denotes Lebesgue 
measure on R, as usual. 


12.32 Definition density function 


Suppose X is a random variable on some probability space. If there exists 
h € L1(R) such that 


X(s)= [na 


for alls € R, then h is called the density function of X. 


If there is a density function of a random variable X, then it is unique [up to 
changes on sets of Lebesgue measure 0, which is already taken into account because 
we are thinking of density functions as elements of L'(R) instead of elements of 
£'(R)]; see Exercise 6 in Chapter 4. 

If X is a random variable that has a density function h, then the distribution 
function X is differentiable almost everywhere (with respect to Lebesgue measure) 
and X’(s) = h(s) for almost every s € R (by the second version of the Lebesgue 
Differentiation Theorem; see 4.19). Because X is an increasing function, this implies 
that h(s) > 0 for almost every s € R. In other words, we can assume that a density 
function is nonnegative. 

In the definition above of a density function, we started with a probability space 
and a random variable on it. Often in probability theory, the procedure goes in the 
other direction. Specifically, we can start with a nonnegative function h € ily (R) 
such that f a h dA = 1. We use h to define a probability measure on (R, 8) and then 
consider the identity random variable X on R. The function h that we started with is 
then the density function of X. The following result formalizes this procedure and 
gives formulas for the mean and standard deviation in terms of the density function h. 


12.33 mean and variance of random variable generated by density function 


Suppose h € L!(R) is such that f° hdA = 1 and h(x) > 0 for almost every 
x € R. Let P be the probability measure on (R, B) defined by 


P(B) = [ha 


Let X be the random variable on (R, ) defined by X(x) = x for each x € R. 
Then h is the density function of X. Furthermore, if X € £!(P) then 


a= | xh(x) dA(x), 


and if X € £?(P) then 
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Proof The equation X(s) = [°., dA holds by the definitions of X and P. Thus h 
is the density function of X. 

Our definition of P to equal i dA implies that [~) f dP = [°.. fh dA for all 
f € L'(P) [see Exercise 5 in Section 9A]. Thus the formula for the mean EX 
follows immediately from the definition of EX, and the formula for the variance 
o?(X) follows from 12.20. 


The following example illustrates the result above with a few especially useful 
choices of the density function h. 


12.34 Example density functions 


Suppose h = lj). This density function h is called the uniform density on 
[0,1]. In this case, P(B) = A(BN [0,1]) for each Borel set B C R. For the 
corresponding random variable X(x) = x for x € R, the distribution function 
X is given by the formula 


0 ifs<0O, 
A(s)=Xs f0<s <1, 
1 ifs>1. 


The formulas in 12.33 show that EX = 5 and o(X) = ONE 


3S 


Suppose a > 0 and 


0 ifx <0, 
tS ae eer ae 
ae ifx > 0. 


This density function h is called the exponential density on [0,00). For the 
corresponding random variable X(x) = x for x € R, the distribution function 
X is given by the formula 


7 0 ifs <0, 
X(s) = oe ie 
1l-e ifs > 0. 


The formulas in 12.33 show that EX = + and o(X) = }. 


Suppose 


for x € R. This density function is called the standard normal density. For 
1 


the corresponding random variable X(x) = x for x € R, we have X(0) = 5. 
For general s € R, no formula exists for X(s) in terms of elementary functions. 
However, the formulas in 12.33 show that EX = 0 and (with the help of some 


calculus) 7(X) = 1. 
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Weak Law of Large Numbers 


Families of random variables all of which look the same in terms of their distribution 
functions get a special name, as we see in the next definition. 


12.35 Definition identically distributed; i.i.d. 


Suppose (O,,F, P) is a probability space. 


e A family of random variables on (OQ, F) is called identically distributed if 
all the random variables in the family have the same distribution function. 


e More specifically, a family {X;},er of random variables on (O,, F) is called 


identically distributed if 
P(X; <s) = P(X <s) 
forall j,k ET. 


e A family of random variables that is independent and identically distributed 
is said to be independent identically distributed, often abbreviated as 1.i.d. 


12.36 Example family of random variables for decimal digits is i.i.d. 


Consider the probability space ((0, 1], B, P), where B is the collection of Borel 
subsets of the interval [0,1] and P is Lebesgue measure on ({0, 1], 8B). For k € Z*, 
define a random variable X;: [0,1] > R by 


X;.(w) = k"-digit in decimal expansion of w, 


where for those numbers w that have two different decimal expansions we use the 
one that does not end in an infinite string of 9s. 

Notice that P(X; < 71) = 0.4 for every k € Z*. More generally, the family 
{Xx}rez+ is identically distributed, as you should verify. 

The family { X;},<z+ is also independent, as you should verify. Thus { X;},e7+ 
is an i.i.d. family of random variables. 


Identically distributed random variables have the same expectation and the same 
standard deviation, as the next result shows. 


12.37 identically distributed random variables have same mean and variance 


Suppose (OQ, F,P) is a probability space and {X;},er is an identically dis- 
tributed family of random variables in £7(P). Then 


EX; = EX, and (Xj) = o(X,) 


forall j,k ET. 
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Proof Suppose j € Z*. Let fi, fo,... be the sequence of simple functions converg- 
ing pointwise to X; as constructed in the proof of 2.89. The Dominated Convergence 
Theorem (3.31) implies that EX; = limy— yoo Ef. Because of how each f,, is con- 
structed, each Ef, depends only on n and the numbers P(c < Xj; < d) force < d. 
However, 


P(c < Xj <d) = lim (P(X) <d- 2) — P(x) <c—3)) 


for c < d. Because { X,}xer is an identically distributed family, the numbers above 
on the right are independent of j. Thus EX; = EX; for all pPke Ze. 
Apply the result from the paragraph above to the identically distributed family 


{X,7},er and use 12.20 to conclude that o(X;) = 0(Xx) for all j,k ET. 


The next result has the nicely intuitive interpretation that if we repeat a random 
process many times, then the probability that the average of our results differs from 
our expected average by more than any fixed positive number ¢ has limit 0 as we 
increase the number of repetitions of the process. 


12.38 Weak Law of Large Numbers 


Suppose (Q,,F, P) is a probability space and {X,},ez+ is an iid. family of 
random variables in £?(P), each with expectation pz. Then 


sin. 7(( 5 ¥9)-n| 2) =0 


for alle > 0. 


Proof Because the random variables { X;,},¢z+ all have the same expectation and 
same standard deviation, by 12.37 there exist pp € R ands € [0,00) such that 


EX,=pe and o(X) =s 
for allk € Z+. Thus 


12.39 e(; bX) = and (73 Xi) = "(Eo %) = 5 


where the last equality follows from 12.22 (this is where we use the independent part 
of the hypothesis). 

Now suppose ¢ > 0. In the special case where s = 0, all the X; are almost surely 
equal to the same constant function and the desired result clearly holds. Thus we 
assume s > 0. Let t = \/ne/s and apply Chebyshev’s inequality (12.21) with this 


value of ¢ to the random variable 1 Lre1 Xk using 12.39 to get 


(Sa n29 <5 


Taking the limit as n — oo of both sides of the inequality above gives the desired 
result. 
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EXERCISES 12 


10 


Suppose (OQ, F, P) is a probability space and A € F. Prove that A and. \ A 
are independent if and only if P(A) = 0 or P(A) = 1. 


Suppose P is Lebesgue measure on [0,1]. Give an example of two disjoint Borel 
subsets sets A and B of [0,1] such that P(A) = P(B) = 4, [0,5] and A are 
independent, and [0, 3] and B are independent. 


Suppose (OQ, F, P) is a probability space and A, B € F. Prove that the follow- 
ing are equivalent: 

e A and B are independent events. 

e Aand QC \ B are independent events. 

e ()\Aand B are independent events. 

e (© \AandQ \ B are independent events. 


Suppose (OQ, F, P) is a probability space and {Ax },er is a family of events. 
Prove the family {Ax }xer is independent if and only if the family {0 \ Ax}rer 
is independent. 


Give an example of a probability space (QO, F,P) and events A, By, Bz such 
that A and B, are independent, A and B> are independent, but A and By, U By 
are not independent. 


Give an example of a probability space (QO, F, P) and events A;, Az, A3 such 
that A; and A> are independent, A; and A3 are independent, and Az and A3 
are independent, but the family A;, Az, A3 is not independent. 


Suppose (OQ, F, P) is a probability space, A € F, and By C By C --- isan 
increasing sequence of events such that A and B, are independent events for 
each n € Z*. Show that A and U%_, By are independent. 


Suppose (QO, F, P) is a probability space and { A¢}+eR is an independent family 
of events such that P(A;) < 1 foreach t € R. Prove that there exists a sequence 
ty, to,... in R such that fal(@ ian Atn) =0. 


Suppose (QO, F, P) is a probability space and By,...,B, € F are such that 
P(ByN---M Bn) > 0. Prove that 


P(ANB, M27 Bn) = P(B,) - Pp, (Bo) - - Pain --B,_1 (Bn) * PBin-MBy (A) 
for every event A € F. 


Suppose (Q,, F,P) is a probability space and A € F is an event such that 
0 < P(A) < 1. Prove that 


P(B) = Pa(B)- P(A) + Poy a(B): P(Q\ A) 


for every event B € F. 


11 


12 


13 


14 


15 


16 


17 
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Give an example of a probability space (Q,F,P) and X,Y € L?(P) such 
that 07(X + Y) = 0?(X) +.07(Y) but X and Y are not independent random 
variables. 


Prove that if X and Y are random variables (possibly on two different probability 
spaces) and X = Y, then Py = Py. 


Suppose H: R — (0,1) is a continuous one-to-one function satisfying condi- 
tions (a) through (d) of 12.29. Show that the function X: (0,1) — R produced 
in the proof of 12.29 is the inverse function of H. 


Suppose (OQ, F, P) is a probability space and X is a random variable. Prove 
that the following are equivalent: 


e X is acontinuous function on R. 

e X is auniformly continuous function on R. 
e P(X =t) =0foreveryt ER. 

e (XoX)(s) =sforalls ER. 

0 ifx <0, 


a2xe** ifx >0. 


Let P = hdd and let X be the random variable defined by X(x) = x forx € R. 


Suppose « > 0 and h(x) = 


(a) Verify that [° h dA = 1. 

(b) Find a formula for the distribution function X. 
(c) Find a formula (in terms of w) for EX. 

(d) Find a formula (in terms of «) for 7(X). 


Suppose BG is the v-algebra of Borel subsets of [0, 1) and P is Lebesgue measure 
on ([0,1],B). Let {e,},¢z+ be the family of functions defined by the fourth 
bullet point of Example 8.51 (notice that k = 0 is excluded). Show that the 
family {eg },ez+ is an iid. 


Suppose B is the g-algebra of Borel subsets of (—7z, t] and P is Lebesgue 
measure on ((—7t, 7t],B) divided by 27t. Let {ex},<z\ {0} be the family of 
trigonometric functions defined by the third bullet point of Example 8.51 (notice 
that k = 0 is excluded). 


(a) Show that {e,} keZ\ {0} 1S not an independent family of random variables. 
(b) Show that {e; };ez\ {0} is an identically distributed family. 
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norm, 163 
coming from inner product, 214 
normal, 302 
normed vector space, 163 
null space, 172 
of T*, 285 


open 
ball, 148 
cover, 18 
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